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ABSTRACT 

This report offers advice on the issues to be 
considered ar.d the steps to be taken when implementing a high school 
graduation test. The research was conducted, specifically, to address 
problems with Mississippi's high school exit test. An external panel, 
developed by Southeastern Regional Vision for Education (SERVE), 
reviewed data obtained through a site visit, interviews, and document 
analysis. Chapter 1 presents introductory information, chapter 2 
provides an overview of the Mississippi context, chapter 3 contains 
an executive summary, and the final chapter contains the full report. 
The following issues are addressed: curriculum/test specification 
issues; additional curriculuir and instructional considerations; 
psychometric testing and »coring; education issues; legal issues; 
policy/administrative issues; and human and financial resource 
issues. Suggestions are also offered for the sequencing of tasks and 
using test scores for accreditation purposes. The report contains 65 
recomm ndat ions , including these: (1) it is legally inappropriate to 
hold students accountable for passing an assessment that covers 
materials they have not been taught; (2) multiple-choice items can 
measure higher order thinking skills and procedures; (3) any 
"off-the-shelf" test would probably be an unacceptable high school 
exit test for Mississippi students; (A) requiring any national 
norm-referencing component of the exit exam poses problems for 
maintaining curricular validity; (5) the various assessment programs 
should be closely articulated; and (6) the use of var : ous tests in a 
performance-based accreditation model requires careful consideration 
of how to set the performance level and what metric to use. 
Information for ordering SERVE products is included. (LMI) 
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About the SERVE Laboratory 



SERVE, THE SOUTHEASTERN REGIONAL VISION FOR EDUCATION, IS A COALITION OF EDUCA- 
tors, business leaders, governors, and policymakers who are seeking comprehensive and lasting improvement 
in education in Alabama, Florida, Georgia, Mississippi, North Carolina, and South Carolina. The name of the 
Laboratory reflects a commitment to creating a shared vision of the future of education in the Southeast. 

The mission of SERVE is to provide leadership, support, and research to assist state and local efforts in improving 
educational outcomes, especially for at-risk and rural students. Laboratory goals are to address critical issues in the 
region, work as a catalyst for positive change, serve as a broker of exemplary research and practice, and become an 
invaluable source of information for individuals working to promote systemic educational improvement. 

Collaboration and networking are at the heart of SERVE's mission; the laboratory's structure is itself a model of 
collaboration. The laboratory has four offices in the region to better serve the needs of state and local education 
stakeholders. SERVE's Greensboro office manages a variety of research and development projects that meet 
regional needs for the development of new products, services and information about emerging issues. The devel- 
opment of this manual was funded through such an R&D effort. The laboratory's information office is located in 
Tallahassee. Field services offices are located in Atlanta, Greensboro, Tallahassee, and on the campus of Delta State 
University in Cleveland, Mississippi. 

To request publications or to join the SERVE mailing list and receive announcements about laboratory publica- 
tions, contact the SERVE office in Tallahassee (address below). 



SERVE- Alabama 

50 N. Ripley Street 
Gordon Persons Building 
Montgomery, AL 36130 
334-242-9758 
Fax 334-242-9708 



SERVE-Florida 

345 South Magnolia Drive 
Suite D-23 

Tallahassee, FL 32301 
Lab 

904-671-6000 
800-352-6001 
Fax 904-671-6020 



SERVE-Georgia 

41 Marietta Street, NW 
Suite 1000 
Atlanta, GA 30303 
404-577-7737 
800-659-3204 
SERVE-Line 800-487-7605 
Fax 404-577-7812 



SERVErMissismppi 

Delta State University 
Box 3183 

Cleveland, MS 38732 
601-846-4384 
800-326-4548 
Fax 601-846-4402 



SERVE-North Carolina 

201 Ferguson Building 
UNCG Campus 
P.O. Box 5367 
Greensboro, NC 27435 
910-334-3211 
800-755-3277 
Fax 910-334-3268 



SERVE-South Carolina 

1429 Senate Street 
1008 Rutledge Building 
Columbia.SC 29201 
803-734-4110 
Fax 803-734-3389 



Clearinghouse 
800-352-3747 



Math Science Consortiu m 
904-671-6033 
800-854-0476 
Fax 904-671-6010 
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Introduction 



By Wendy McColskey, Research Program Manager, SERVE 



SERVE WORKS WITH STATES, Dis- 
tricts, and schools to improve educational 
practices and outcomes. SERVE provides 
a variety of products and services to the 
region. Of particular relevance to this 
report is the work accomplished in the area of 
student assessment. The lab's work in this area has 
included a manual for teachers entitled Hoxu to 
Assess Student Performance in Science: doing Beyond 
Multiple-Choice Tests, workshops and conferences 
for teachers to promote awareness of alternative 
assessment options, and participation with the 
other nine regional labs in the development of a 
Database of Alternative Assessments in Math and 
Science, and a Toolkit for Professional Developers in 
Alternativ Assessment in Math and Science. 

In addition to these district and school level prod- 
ucts and services, SERVE has also supported state 
assessment directors in the Southeast. SERVE 
sponsors biannual meetings of the six state assess- 
ment directors to provide a forum for them to 
discuss and share issues they are facing. The advent 
of alternative assessment and the whoit discussion 
about moving all students to higher levels of 
academic performance requires that curriculum, 
assessment, staff development, accreditation, 
Special Education, and other areas work closely 
together in implementing change;s in assessment 
programs. Often, there is little time and opportu- 
nity for these staff members to meet and hear f rom 
experts in the field. The biannual assessment 
meetings have provided an opportunity for suc h 
communication about assessment topics. 



Borrowing from 
NCREL 

ONE ADVANTAGE OF THE REGIONAL 
laboratory system is that labs can build 
upon the expertise found i n other labs. 
This report resulted from the example set by t he 
Regional Policy Information Center (RPJC) of the 
North Central Regional Educational La sratory 
(NCREL).NCREL offers a series of policv papers 
on high stakes assessment (i.e., "the use of test 
results to make important decisions about the test 
taker"). One of these publications was entitled, 
"Issues and Recommendations Regarding Imple- 
mentation of High School Graduation Tests" which 
included a report by a panel committee chaired by 
Dr. William A. Mehrensof Michigan State Univer- 
sity, on the application of curricular, psychometric, 
educational, legal, administrative, and resource 
requirements for graduation tests to the Michigan 
context. The report was a result of a request f rom 
the state for advice on the implementation of a 
legislative act requiring a high school graduation 
test. 

In the Preface to the Michigan Report 
it is stated: 

"Certainly it is possible to develop a high school 
graduation test that meets c urriailar, psychomet- 
ric, educational, legal, administrative, and resource 
requirements. However, as this document makes 
clear, the task is not easy and timelines are fre- 
quently tight. For the task to be done well, a variety 
of steps need if) be taken soon after any legislative 
enactment. Immediate funding will be needed to 
ensure adequate human and fiscal resources. Only 
with appropriate f unding to complete the task will 
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a high school test graduation requirement be of 
service to the citizens of a state." (pp. 16-18) 

In the Executive Summary, Dr. Linda Ann Bond of 
NCREL provides the context for the current interest 
among states in implementing high school gradua- 
tion tests. 

"A new wave of educational reform in the 1990s has 
brought with it a resurgence of interest in high 
sc hool graduation tests, but the types of skills that 
are now deemed essential to success have changed. 
Instead of holding students to "minimal" skills, 
these new mandates are intended to raise standards 
beyond minimal levels of achievement. Current 
thinking suggests that to be successful in today's 
technologically advanced workplace, high school 
graduates need skills that used to be reserved f( >i the 
college-bound. Minimum competenc ies are not 
enough. Many policymaker* today look to gradua- 
tion tests to raise the higli sc hool graduate's skills 
and knowledge to the higher level expected for 
success in a complex, demanding soc iety and 
workplace." (p. 7) 

She concludes that: "Because a high school gradua- 
tion test carries with it such high stakes, careful 
attention to the soundness of the test design process 
and to the legal defensibility of the test product is of 
critical importanc e." (p, 7) 

The Request 
from Mississippi 
and the Response 
from SERVE 




TATE-M AN DATED TESTS REPRESENT 
larg 's set for students to achieve. As the quote 
from i. . Bond suggests, these targets are 



moving targets. That is, expectations articulated lor 
high school graduates in the H»70s with the first 
wave of high school exit tests may be different from 
those needed hv high school graduates in the 1990s. 
A number of states are in the process of upgrading 
high school minimal competency tests developed a 
(lei ade ago. Mississippi is oneof those slates. 
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Dr. Cindy Ward, the Director of Student Assess- 
ment in Mississippi, had read the NCREL report 
and being one year into the complexities of plan- 
ning for the upgraded high school exit test felt that 
a panel review of their status could be very helpful 
as a way of ensuring that they were meeting the 
necessary curricular, psychometric, educational, 
legal, administrative, and resource requirements of 
a sound test development process. She approached 
SERVE about sponsoring such a review. 

SERVE agreed to fund a Panel's review for the 
benef it of Mississippi's fut ure generations who will 
be taking the test and for what could be learned 
that might help other states, Dr. Mehrens agreed to 
c hair the panel of experts. Dr. Mehrens is a Profes- 
sor of Educational Measurement and a nationally 
known expert in his field. He has recently been 
elec ted vice-president for the Division of Measure- 
ment and Research Methodology of the American 
Educational Research Association and is a past 
president of the National Council on Measurement 
in Education. 

SERVE identified panel members representing a 
wide range of experience to make a site visit to 
gather information and to draft the report. Several 
others who were not available to make the site visits 
agreed to review and comment on drafts of the 
report. 

• Two panel members. Dr. Roger Trent, the 
director of testing in Ohio, and Dr. Sharon 
Johnson-Lewis, the director of Planning, 
Research, and Evaluation for Detroit Public 
Schools, had been part of the team chaired by 
Dt. Mehrens which had written the report for 
Michigan in the NCREL document. 

• The panel members had a wealth of state 
lesting experience, including the legal chal- 
lenges posed hv graduatio: ■ tests. State testing 
directors f rom Ohio (Dr. Roger Trent), Louisi- 
ana (Ms, Rebecca Christian), Florida (Dr. Tom 
Fisher), and Maryland (Dr. Robert Oahrvs) 
participated. 

• The testing expertise was balanced by curricu- 
lum expertise in the form of Mr. 1 .ane Peeler 
(South ( '.arolina Department of Education), 
who had been involved in the development of 
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curriculum frameworks in South Carolina, and 
Dr. Barbara Kapinus (Council of Chief State 
School Officers), who has worked closely with 
states in reviewing the implications of national 
content standards. In addition, Dr. Susan 
Barnes (Texas Education Agency) brought 
measurement and policy expertise through her 
work with the design and implementation of 
personnel performance assessment. 

We wish to thank all of these panel participants for 
their willingness to take time out of their busy 
schedules to contribute their expertise to this 
project. 



The Report and 
Its Use 

AFTER THE TWO DAY SITE VISIT IN- 
cluding extensive interviews and 
analyses of relevant documents, Dr. 
Mehrens, with ihe assistance of the Panel, wrote an 
extremely informative, readable, and thorough 
report which applied their extensive expertise in 
good test development requirements to the Missis- 
sippi context, with discussions of issues and recom- 
mendations for solutions. We of fer the full report 
in ( lhapter 4. ( lhapter 3 is the report's Exec utive 
Summary. Chapter 2 was adapted from the intro- 
duction written by the Panel and sets the Missis- 
sippi context for the report. The report concluded 
with the following: 

i 

• It is legally inappropriate to hold st udents 
ai countable for passing an assessment that 
covers material that they have not been taught. 
This makes using a high stake's graduation 
assessment to drive nit ric ulat change some- 
what troublesome. One can usetheannounc e- 
ment of an upcoming assessment to drive 

em ric nlar change. This, of course, requires that 
there be considerable time between the an- 
nouncement of the assessment and its imple- 
mentation. 

• Multiple-choice items van measure higher- 
order thinking skills and procedures, Perfoi - 
m.nii e assessments inav not of f er high enough 



psychometric qualities to be used for high 
stakes assessments. 

• It is unlikely that any "off-the-shelf test would 
be an acceptable high school exit test for the 
students of Mississippi. 

• Requiring any national norm-referencing 
component of the exit exam would complicate 
the task of maintaining curricular validity for 
the test. 

• There must be close articulation among the 
various assessment programs. They should not 
work at cross purposes, and if they are serving 
the same purposes, perhaps less assessment is 
needed. 

• The use of the various tests in a performance- 
based accreditation model requires careful 
thought regarding how to set the performance 
level and what metric to use in setting the level 
(e.g., average performance or percentage of 
students above some cut score). 

The- report was produced by the Panel injanuai y 
of 1995. Because it was so "thorough", including 6f> 
recommendations relative to curriculum, assess- 
ment, professional development, remediation, and 
accreditation, it was difficult for the assessment 
unit to dec ide how to involve others in the depart- 
ment in discussing the issues, especially since there 
was no existing interdepartmental team that dealt 
with assessment planning issues. Because of the 
number of decision points in the report, SERVE 
worked with the assessment department to summa- 
rise the major issues (Recommendations) in a 
matrix format (Table 1). This information along 
with the timeline ( Table 2) suggested by the Panel 
for test implementation were proposed to the 
superintendent as agenda items for a meeting of 
involved departmental program direc tors. 

As c an be seen f rom reviewing Table I, it would be 
extremely dif f icult. given the highly complex and 
interrelated issues involved in developing a "high 
stakes" exit test, for an assessment director to 
personally comnnnm ate to and educate others 
about the implications of all the decision points. 
The discussion guide offered a coi >c rctc way of 
helping the assessment staf f to begin to get the 
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report recommendations on the table for the 
department. 

An interdepartmental meeting of directors of 
Instructional Development, Accreditation, Alterna- 
tive Education, Student Assessment, Title I, and 
Tech Prep was held in April 1995 to examine and 
discuss the key issues presented by the Panel 
Report. The SDE Interdepartmental Team met 
again on April 14th 1995 to discuss the issues with 
Dr. Mehrens, the Chair of the Panel. Thus, the 
report has provided a concrete means for the key 
state department players in the systemic reform 
process of upgrading expectations for student 
performance to come together. The importance of 
getting issues thoroughly discussed by such a team 
should not be underestimated. Such discussions 
early on will pay tremendous benefits in terms of 
the implementation of an exit test that accom- 
plishes its objectives. 

Mississippi 
Response to the 
Report 

by I) r /, i nda Wa rd, Di recto r ofStude n I A ssess me n I, 
Mississippi State Depa rtmentoj Education 

THE DECADE OF THE 1980S, THOUGH 
touted as the decade of significant educa 
tional reform, established minimum 
standards of academic competency for students 
throughout the nation, and Mississippi was no 
exception. The Functional Literacy Examination, 
commonly known as the FLE, was built around 
specific skills identified in the Mississippi Cm ricu- 
lum Structure. As mandated in legislation, success- 
ful completion of this test has been required for 
high school graduation from Mississippi public 
schools since 1987. Careful attention was given to 
precise test development of the Fl.E to minimize 
potential legal issues which surround such a high 
stakes test. The test has succ eeded well in fulfilling 
its mandate, but as desired, the passing rale on this 
test eac h year is very high. Therefore, the current 
FI ,E has been viewed by many educators and 



espec ially students as having outlived its useful- 
ness. 

At first glance, moving to a new high school exit 
test seemed easy enough to many. Some policymak- 
ers who were committed to the rapid improvement 
of instruction in Mississippi classrooms, had 
difficulty with the absence of visible action by both 
staff and a committee of practitioners charged 
with establishing a new exit test. Af ter all, other 
components of the new Mississippi Assessment 
System, recommended by the Superintendent's 
Task Force on Accountability and Learning, 
including a norm-referenced test with constructed 
response items, were piloted in the fall of 1994. 

Several committees, each charged with the respon- 
sibility of recommending implementation strate- 
gies for specific areas studied by the 
Superintendent's Task Force, were advancing with 
their work. However, curriculum revisions at 
various levels, along with the changes occurring 
from the implementation of technology into the 
educational system of Mississippi, promoted 
greater, more complex issues than those handled by 
other committees. 

Especially c hallenging to the high school exit test 
update issue is the increasing number of assess- 
ments contained in the Mississippi Assessment 
System. In a state which frequently ranks near the 
bottom of numerous educational indicator lists, 
accountability takes on an even more profound 
nature. Results of assessments in Mississippi have 
demonstrated that assessment has had an impact 
on student performance. Yet one of the cries from 
educators, especially teachers, was the amount of 
time spent on testing with statewide assessments. 
Faced with many critical issues related to imple- 
menting a high school exit assessment, it seemed 
timely and appropriate in the fall of 1994 to seek 
assistance, with SERVE's help, from knem l-clgeable 
prof essionals outside of Mississippi. 

The External Review Panel Report, whic h was 
produced from this outside review process, is 
comprehensive and thorough, considering only 
minor areas of disagreement which ;ue inherent in 
the manner in which the data for the report was 
obtained, Of importance is the extent to which the 
report identified and addressed the same issues. 
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concerns, and problems that the High School Exit 
Assessment Implementation Committee has en- 
countered. Further, the level of attention given to 
the interrelatedness of all assessments in the Missis- 
sippi System is significant. The report has fostered 



communication about important issues among 
internal agency offices and external parties. It has 
become the catalyst for continued activities related 
to the challenging tasks remaining in the develop- 
ment of a new high school exit assessment. ♦ 




Issues in Implementing a Successful High School Exit 
Assessment: A Discussion Guide for Interdepartmental 
Planning 



Curriculum/ Test 
Specification Issues 

What standards and competencies 
will be assessed? 

• Readiness of curriculum f rameworks to 
provide a basis for establishing high school exit 
competencies. 

• Tension between assessment that drives reform 
and ensuring the opportunity to learn. 

• Relationship between higher expectationsand 
practical costs of more remediation. 

• Specif icity of curriculum frameworks/provide 
sufficient direction for test specifications. 

• Need opportunity to leant data from students 
and educ ators prior to first pilot and again at 
the time of the f i: st real administration. 

• Public M ion of competencies to be tested 
(notification/due pio< ess). 
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• Professional development ass istance for 
teachers to understand competencies to be 
assessed. 

Standards for 
Educational and 
Psychological Testing 

What decisions about test 
development/selection need to be 
made? 

• Customized vs. in house. 

• Who has the authoi ity to review/accept test 
specifications for use in the RFP? 

• Is there an instate content review team to 
audit the work of a contractor? 

• Should a technical advisory committee offer 
statewide opportunities to discuss advantages 
am disadvantages of multiple-choice vs. 

pei lormance assessment including cost 
estimates? 

What is a reasonable timeline to 
ensure the opportunity to learn? 

• First pilot-Fall 1996 

• Second pilot-Fall 1997 

Should failure rate be shared with 
the districts? 

• First real administration-Fall 1998 

• First class afferted-2001 (sample attached) 
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What decisions about scoring/ 
reporting need to be made? 

Train a standard-setting committee. 

Will first administration or pilot data be used to 
get cut scores? 

Establish a technical advisory committee. 

Consider phased-in cut scores. 

Establish an item sensitivity committee (bias 
review). 

Discuss test vs. subtest reporting for diagnostic 
purposes. 

Use a technical advisory committee to assist 
with equating. 

How will personnel be trained to administer 
(he test? 

Consider random auditing of administration 
process. 



Human Resource 
Issues 

What recommendations should be 
made about fiscal needs? 

Realistic timeline 
Additiona' staff 
Committees needed 
Test ing Policy Advisory ( lommittee 
Item Sensitivity Review Committee 
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Technical Advisory Committee 
Three Content Review Committees 
Three Cut Score Commit tees 



Legal Issues 

What are the liability issues involved 
for committee members, teachers, etc.? 

• What documentations/other policies need to 
be reviewed or put in place? 

• Should a new code be written? 



Accreditation Issues 

What are some of the accreditation 
issues that need discussion? 

• Scaled average score and incompatible incen- 
tives 



• Percent vs. average 

• Fii st try/cumulative percentage 

• Alignment and weighting of standards 

Coordination Issues 

• Subject matter expei tsshould study and report 
on the art n idation ot die entire testing pro- 
gram. 

• ( ,t insider whether passing other tests arc 
alternatives to certain M A AP tests. 

• ( .otisidei whether there is ton mm h testing. 
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Developing the Test: A Suggested Sequencing of Tasks 



1995 



1996 



1997 



1998 



Sp Su F W Sp Su F W Sp Su F W Sp Su F W 



i a s 4 



r> 6 7 



8 9 10 



11 12 



13 



14 IS It) 17 181920 



Taskl: 



Task 2: 
Task 3: 
Task 4 
Task 5 
Task 6: 
Task 7: 
Task 8: 
Task 9: 
Task 10: 



Task 11: 
Task 12: 
Task 13: 
Task 14: 
Task 15: 
Task 16: 
Task 17: 
Task 18: 
Task 19: 
Task 20: 



1 .stafalish appropriate advisory committees, 

Dejm rtment of Education Steering Com mil tee 

7 es t i ng Po I icy Adi <iso ry ( ,"o m m it tee 

Item Sensitivity Review Committee 

Tech nieal Advisory Com m it tee 

Content Review ( h m m it lets 

Stn ndn rd Set t i ng Co m m i t tees 

Determine what standards will be assessed. 

Disseminate information about Task 2. 

Complete test specif ications for each test area. 

Hire a contractor for development of resources. 

Completion of contractor's work on resources. 

Content Committee review and revisions. 

Camera-ready copy for field testing. 

Field test items on Grade 10 students. 

Prepare and disseminate sample test items and descriptive information for teachers, students, 
and parents. 

Develop rules governing testing procedures. 
Analyze feedback f rom first field test. 
Conduct second field test. 
Revise items f rom the second field test. 
Select operations contractor for scoring. 
Conduct regional seminars on testing procedures. 
Complete production for first tests. 
Administer first test to Grade 10 students. 
Score and analyze results of f irst test. 
Design plan for releasing results to the public 



Task 21: Review and lepeat steps above. This step is a continual process. 
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Background 
Information 
for the Report 



THK PI • K P( )SK ( )F I I IK REP( )RT IS 
inolitT readci sadv k e on the issues 
that need to be considered (and 
resolved) and the s t eps that need to he 
taken when implementing a high 
school graduation test. The report also discusses 
sonu* advantages and disadvantages of potential 
decisions. I lowevcr, because Mississippi has other 
statewide assessments, some currently operating 
and others being planned, and because these other 
assessments interact with the high school gradua- 
tion test, the External Review Panel commented to 
some extent on those other programs. 

Mississippi 
Context 

OBVIOl SI Y. THK ADVK T. ( II YEN 
within (he report is based on the Panel's 
understanding of the context that exists 
within Mississippi at the cut rent time. Their 
understandings about the current context were 
that: 

• A state code current I v exists regarding the 
Statewide Iesting Program. Relevant sections 
of that ( lode are sen ions H7-HH, 1*>?-U)-'A M-U\- 
l.: , »7-l(i-r),.l7-l(i-7.:<7-l('H),iind."»7-l(i-ll 

• In the fall of IW2. the State Superintendent ( >f 
Education ( on vetied (he Superintendent's Task 



Force on Assessment for Au ountabilitv and 
Learning and ( barged them with designing a 
system of assessments to serve ac< ountabilitv 
and individual assessment to meet the indi- 
vidual instructional needs of students. The 
assessments were to be designed in such a way 
that education would have no alternative but to 
( hange dramatically. The Superintendent was 
interested in implementing three major 
initiatives: a new assessment system, technical 
preparation programs, and professional devel- 
opment. 

• The Superintendent's Task Force's report and 
recommendations were turned over to an 
Implementation Task Force. The implementa- 
tion process isbeing carried out by iluee 
committees as follows: Norm-Referenced 
Assessment J aplementation Committee. 
YVorkplace Competency /Employabi lit v 
Assessment Implementation Committee, and 

1 ligb School Exit Exam Implementation 
Committee. There exists an Overall Implemen- 
tation Steering Committee that < (insists of (In- 
dian of the original task force, and the (hair 
and co-chair o( each implementation ' onnnii 
tee named. To-dale, several committee reports 
have been issued. 

( ui rent b. stale I csiing in Mississippi ( ( insists of the 
following; 

• The I TBS and some Riverside produced and 

s< 01 ed pel lot mam e assessment exercises being 
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administered in grades 3-8 (for reading, lan- 
guage arts, and mathematics) and the Tests of 
Achievement and Prof iciency (TAP) in grade 9 
for integrated Language Arts and mathematics 
(pilot study this year, to be implemented in the 
1995-1996 school year). These tests a» i to be 
given in the fall and used for instructional 
improvement and school accountability 
(accreditation, not broad accountability). 

The Functional Literacy Exam which is admin- 
istered first in the spring to students in the Uth 
grade and is used both for a graduation re- 
quirement and a school accountability require- 
ment. This is a test of basic skills in reading, 
written communication, and mathematics. A 
very high percentage (approximately 94%) of 
the students pass this test on the f irst attempt. 
This test is tentatively scheduled to continue at 
least through the 1995-1996 school year. 

The proposed Mississippi Assessment of 
Academic Proficiency (MA AP)(to replace the 
FLE). This test is to cover reading, mathematics 
and written communication and is to eventu- 
ally be used for a graduation requirement and 
an accountability requirement. A set of pos- 
sible instruments from different vendors was 
ideally (according to a "High School Exit 
Assessment Implementation Table") to be 
ready for a first pilot test to be given to 10th 
graders in the Spring of 1995. A test was to be 
selected f rom this set and given for standard 
setting purposes in the Fall of 1995. The new 
MAAP is not currently scheduled to "count" for 
high school graduation requirements or 
accreditation purposes until the Fall of 1996 
(which would be the graduating class of 1999). 

• The Subject Area Testing Program (for all 
students enrolled in the subject area that is 
tested). Currently a test exists in Algebra I, and 
there are plans to develop one in biology and 
one for U.S. History from the year 1877. The 
algebra test needs to be revised to fit the new 
curriculum structures. These tests are to be 
used for accountability purposes, not for 
making judgments about individual students. 

• ( )ccupat ional skills tests given to vocational 
completers. ( iui rcntly these tests are being 
piloted in 15 sites and will be used for compli- 
ance with federal requirements. 



• Workplace competencies to be given to stu- 
dents in grade 12. Currently the state is under 
contract with ACT to use four parts of Work- 
Keys (Reading for Information, Applied 
Mathematics, Writing, and Locating Informa- 
tion), and pilot administration is to begin in the 
Spring of 1995. The current plan ir, for scores on 
these tests to eventually be used in the account- 
ability (accreditation) process. 

The charge to the Panel primarily pertained to the 
new MAAPexam (to be required for graduation). 
However, the total context was relevant to their task. 
Consequently, several sectionsof the report will 
include comments related to the other portions of 
the statewide assessment. 

Outline of the 
Report 

THE FULL REPORT (( 1HAPTER 4) HAS A 
short section (Section I) reviewing and 
evaluating existing legislation and policies 
related to the Functional Literacy Exam (FLE). In 
Section II, the Panel reported on the complex issues 
that must be faced during the planning and imple- 
mentation stages of a high school exit exam. The 
report calls attention to a series of issues, then 
recommends solutions to some of them. In writing 
this section, the panel tried to reference current 
procedures for the FLE because many of the 
procedures would be the same for the proposed 
MAAP, or indeed any exit examination. Neverthe- 
less, the panel acknowledged that they were prob- 
ably not fully aware of all current procedures and, 
therefore, may in parts be offering advice that the 
Mississippi Department of Education has already 
implemented. They hoped that the Department 
personnel would feel complimented rather than 
of f ended bv suggestions about procedures and 
policies already in place. 

Section III provides an overview of some of the 
important steps to be considered in developing and 
implementing a high school graduation test and 
suggests when these steps need to be taken. Again, 
parts of this section may be offering suggestions 
that are already being implemented with the FEE 
and can simply he adapted for the MAAP. 



12 



Background Information for the Report 

17 



Sec tion IV relates to issues that should be consid- 
ered when using test results for ac c reditation 
put poses. Although the Panel did not receive a 
specif ic request for this section, they felt that 
discussions about the use of test results were 
critical. 

Obviously, discussions in this report such as the 
specif ic procedures to f ollow in building and 
implementing an exit test, their timing, and the 
resolution of the issues of ten overlap. It needs to he 
stressed that no procedure will produce a perf ect 
assessment instrument or process. Perfect assess- 
ment procedures simply do not exist. However, a 
test should be as good as it can be given the con- 
straints. Whether any given test or process is legally 
def ensible is ultimately a decision f or the courts. If 
followed, standards established by the measure- 
ment profession make a test more def ensible. But. 
in > set of standards should be used as a checklist. 

This report cann.n (and is not intended to) replac e 
the advic e that a state department of education w ill 
need from an ongoing technical advisory commit- 
tee. The advice from such a < e mini it tee is essential 
to the development of a technically and education- 
ally sound program. 

Definition of 
Some Terms 

THKRK AKKSOMK MKASl'KKMKN T 
terms which, while they have fairly stan- 
dard definitions among measurement 
experts, are not always used by other educators and 
lay people with the Name meanings, To facilitate 
communication, the panel provided the following 
definitions of sonic commonly used terms. 

Norm-referenced 

When an assessment is 1101 ni-i efcrein cd, the scores 
made by individual students are compared to the 
scores of sonic identified norm group or groups. 
The norm group may be Ic >eal, state, national, or 
(theoretic allv)global. A requirement of norm 
referencing isthat the assessment given foall those 
vv In i wish tc > reference scores against the noi in 
gioup he given under the same- standardized 
administrative conditions and scoring pn>< ecluies 



as the original norm group. For example, if stu- 
dents in the norm group took the assessment under 
timed conditions, the students whose scores were to 
be compared to the norm group would have to take 
the test under the same time constraints. Norm- 
referenced scores are not dependent upon any type 
of item format. Performance assessments and 
multiple-choice tests can both be norm -referenced, 
Norm referencing scores does not prohibit setting 
standards or employing criterion-referenced test 
interpretation. 

Criterion-referenced 

In criterion-ref erencing, one ref erenc es the scores 
by comparing them to a standard or set of stan- 
dards. In a high sc hool exit exam there would 
typically be one standard and students whose 
scores were at or above the standard would pass and 
those whose scores were below the standard would 
f ail. As with norm referencing, criterion-referenc- 
ing does not depend upon any type of item format. 
Scenes from both multiple-choice and perfor- 
mance assessments c an be critei ion-refei encecl. 
Again, to be f air, all students should take the test 
under the same administrative and sc oring proce- 
dures. He.wever. if there is no external noun group, 
one need not be concerned about standardizing the 
conditions to those under which the external group 
took the test. 

Multiple-choice Item 

Obviously this type of item has a set of options, and 
t he test takers c hoose (typically) one of the options 
as the best answer. These items can be mac bine 
scored very quic kly. Contnu v to some- rhetoric , 
multiple-choice items c an assess higher-order 
thinking skills and indeed can require problem- 
solving skills to obtain the correct answer. 

Performance-based Assessment 

While this term can he used in a variety of wavs, we 
w ill use it to mean all assessments that require some 
sort of construe ted response which needs to be 
scored. I'crloi inane e-baseel assessments c an, but do 
not necessarily, require higher-order thinking skills 
or problem-solving abilities. As mentioned above, 
to be f air, all students should take these- perfor- 
i runt e exercises under the same administrative 1 
and scoring pre ic rehires, ♦ 
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THIS REPORT OFFERS READERS 
advice on the issues thai need to be 
considered (and resolved) and the 
steps that need to be taken when 
implementing a high school gradua- 
tion test. Following a general introduction and a 
review of" legislation and policies related to the FLE, 
issues arc discussed under the following headings: 

• ( atrricuhmi/Test Specification Issues, 

• Additional ( airrieulutn and Instructional 
Considerations, 

• Psychometric Issues, 

• Education Issues, 

• Legal Issues, 

• Policy /Administrative' Issues, and 

• I himan and Financial Resource Issues. 

Following those discussions, we include a short 
section on the sequencing of tasks and a section on 
using test scores lor accreditation purposes. 

The report c inuains (>5 different recommendations 
surrounded In extensive discussions. I Iowever. the 
report cannot (and is not intended to) replace the 
advice that the State Department of Education w ill 
need from an ongoing technical advisory commit- 
tee. Some ol the more important of the issues ;»nd 
recommendations are discussed below in this 
executive summary, 1 Iowcver. \vc urge all readers to 
study the total report. 

Curriculum/ Test 

Specification 

Issues 

AS RECOMMENDED U\ 'THE HUH I 
School I'.sil Assessment Implementation 
( km nn it tee, the initial assessment atc.isol 
l lie new MAAP should be limited to trading, 
mat hematic sand written < ouununu alion 1 heir 
needs to be much mote e\ idem e ol urn ii ulai 

16 



validity (opportunity to learn) prior to the imple- 
mentation of the new MA AP and the state has an 
obligation to provide professional development to 
local teachers regarding how best to ensure that the 
new competencies are adequately taught. 

We support the development of additional curricu- 
la! structures. We believe "ie term curricula! 
structures* is preferable to f rameworks. 

Psychometric 
Issues 

VALIDITY RFFERS TO II IE DEGREE TO 
which evidence supports the inferences 
that are made from assessment scores. 
Department of Education employees need to be 
cautioned against making any unsubstantiated 
statements about w hat the assessment measures or 
w hat inferences can be made from the assessment 
scores. For example, a statement such as the assess- 
ment would ensure that if students passed they 
"would be able to be successf ul in the real world." 
implies evidence of predictive validity. If no 
predictive validity exists, such inferences cannot 
legitimately be drawn. 

Exercise development is very important. If the 
developed exerc ises at e f aulty, the- assessment w ill 
be inadequate. T he assessment should be specifi- 
cally constructed for the graduation requirement. 
It is very unlikely that any "off the shelf "existing 
assessment package would be adequate. A Request 
for Proposal (REP) should be issued (o develop this 
assessment package. The RFP should demand that 
the- conn allot design sufficient safeguards into the 
assessment development to ensure adequate 
content validity. Both department employees and 
an in-siatccontciit review team should be- involved 
in teview ittg various processes and products 
throughout the development stage. 

Performance assessments should be used w ith c ate 
and recognition and consideration should he given 
Iodic lac t that such assessments at e lirqiicittlv not 
as i»s vi hornet i ie allv sound not ,is cost ef Ice live as 
the mote It adiiional multiple-choice assessments. 
It is it ite thai pet foi main e assessments! an assess 
some competencies that < annul be assessed w ilh 
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multiple-choice items, and they should be used for 
such competencies. Nevertheless, multiple-choice 
items are the most efficient and effective format to 
assess many competencies. There need to be com- 
pelling reasons for using performance measures to 
assess those competencies that are amenable to 
assessment with more traditional approaches. 

We applaud the recommendation of the High 
School Exit Assessment Implementation Commit- 
tee that there be two years of pilot work on the new 
MA AP. However, we think, their proposed time-line 
is too optimistic. We advise that Mississippi move a 
bit more slowly than originally planned, proceed 
with appropriate thoroughness, and document 
every step of the design and implementation 
process. 

Many specific recommendations are made regard- 
ing technical issues such as scoring, standard setting, 
item sensitivity review* and item bias studies, 
reliability, scaling/reporting, the number of forms 
required, equating, and standards of test adminis- 
tration. Some of these are quite technical in nature 
and will not be covered in this executive summary. 



Education Issues 

THE NEW MAAF NEEDS TO EE ARTICU- 
lated with the other tests in Mississippi. 
Thought needs to be given as to whether the 
various tests and their specific uses within the state 
complement each other or result in competing 
goals. Specific procedures regarding retesting 
should be planned and adopted by the Board. A 
proposal that addresses questions regarding the 
remediation ef forts and the respective responsibili- 
ties of the state, the district, and the student needs to 
be developed. 



Legal Issues 

LIABILITY ISSUES MI ST BE CONSIDERED. 
Necessary statutes with respect to liability 
should be obtained. All committees and staf f 
should be informed regarding their potential 
liability. 



Students and their parents need to be given suffi- 
cient notice regarding the new graduation require- 
ment. The new MA AP should not be implemented 
until it can be demonstrated that students have had 
an opportunity to learn the competencies to be 
assessed. 

All procedures, security provisi.msor the assess- 
ment, and issues concerning accommodations must 
be documented. 



Policy/ 

Administrative 
Issues 



A 



PLETHORA OF ISSUES NEED TO BE 
resolved including administrative rules, 
frequency of administra ion, etc. 



Human and 
Financial 
Resource Issues 

\ A TE CAUTION AGAINST PROCEEDING 
without sufficient staff and resources. 
T ▼ There probably needs to be additional 
staff in both the student assessment and the cm 
riculum/instructional units of the department. 
Advisory committees need to be established. These 
include a testing policy advisory committee, ai 
item sensitivity review committee, a technical 
advisory committee, a content review committee 
for each content area of the assessment, and a 
committee to recommend the cut score. The 
number of contractors should probably be limited 
to two. Sufficient financial resources are needed to 
do the high quality job required to build an educa- 
tionally sound and legally defensible assessment. 
Information from other states should be obtained 
to assist in determining the amount of resources 
needed. 
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Using Test Scores 
for Accreditation 
Purposes 

IN THK ACCREDITATION SYSTEM, THE 
"success of the school system" could and 
pei haps should he del inecl in terms of the 
nuinher oi st udents who demor.sti ate the desired 
level of performance rather than in terms of aver- 
age scores. It may he preferable to use the cannula • 
live proportion who have passed the MAAI'ai the 
end of some given grade (e.g.. grade 11), rather than 
the initial pass rate. The MDE should study carefully 
the alignment and weighting of all performance 
standards used across the elemental y. middle, and 
high school grades. 



Conclusions 

IT IS POSSIBLE TO DEVELOP A WKLL- 
designed high school graduation test that meets 
curriculum, psvc hornet i ic.educ ational. legal, 
administrative, and resource requirements. I low- 
ever, the task is not easy. For it to he done well, a 
variety of steps need to In' completed. For those 
steps to he completed, adequate f unding must he 
made available. 

While all the recommendations are not covered in 
this executive summary, we point out below some of 
the most pertinent aspects that have been consid- 
ered in the report. 

• It is legally inappropriate to hold students 
accountable for passing an assessment that 
(overs material that they have not been taught. 
This makes using a high stakes graduation 
assessment to drive cut rit ular change somewhat 
troublesome. One can use the announcement of 
an upcoming assessment to drive t auricular 
change. This, of com sc. requires that there he 
considerable time between the announcement 
ot the assessment and its implementation. 

• Multiple-choice itemsc an measure higher- 
oi del thinking skills and pi oc edvues. Pei foi 
inaui e assessments mav nut of lei high enough 



psychometric qualities to be used for high 
st akes assessment s. 

It is unlikely that any "of f -the-shelf " test would 
be an acceptable high school exit test for the 
studentsof Mississippi. 

Requiring any national norm-referencing 
component of the exit exam would complicate 
the task of maintaining c urric ular validity f or 
the test. 

There must be close articulation among the 
various assessment programs. They should not 
work at c ross purposes, and if they arc serving 
the same purposes, perhaps less assessment is 
needed. 

The use of the various tests in a performance- 
based accreditation model requires c areful 
thought regarding how lo sei the pel fbrmauc c 
level and what metric to use in selling the level 
(e.g.. average perfc >i mane c or pcrc village < »f 
students above some cut score). ♦ 
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FOREWORD 




HE EXTERNAL REVIEW PANEL on 
the Mississippi Assessment of Aca 
demic Proficiency (M AAP) was 
convened to advise the Mississippi 
High Srho()l Exit Assessment Imple- 



mentation Committee (HSEAK ]), the State Board 
of Education (MSRE), and the Mississippi Depart- 
ment of Education (MDE)on important issues 
surrounding the proposed high school prof iciency 
examination and other components of their state 
assessment programs. The panel members are 
national experts who have first-hand knowledge 
and experience with large-scale testing programs; 
thev brought to the task a wealth of information 
arid wisdom on the challenging issues that Missis- 
sippi educators will face as thev developand 
implement dif ferent aspectsof their proposed new 
assessment programs. 

( )nr specific ( barges were as follows: 

• Review and evaluate legislation and policies 
specifically related to the current high school 
exit exam in Mississippi, the Finn tiunal Lit- 
eral's Exam (FI.E). 

• Review and evaluate the process and proce- 
dures for designing and identifying and or 
determining the new high school exit assess- 
ment, the Mississippi Assessment of Academic 
Proficiency (MAAP). 

• Review and evaluate currii ular and instruc- 
tional documents related to statewide assess- 
ments at the secondary level, especially those 
more closely related to the FI E. and including 
proposed academic standards and competen- 
cies issued to districts for review for MAAP. 

• ( londuct a t wo-dav site visit in Jackson. Missis- 
sippi, consistent with objectives approved by the 
I ligh School Exit Assessment Implementation 

( '.onnnittee; and 

• ( kimpletea post-visit report on the status of the 
pioject. to include recommendations to the 

I ligh School Assessment Implementation 
( .onnnittee and the State Board of Education. 



All six members of the External Review Panel bad 
been mailed a package of materials on the buck 
ground and implementation plans foi the pro- 
posed Mississippi state assessment programs. Tin re 
members of the External Rev iew Panel met in 
Jackson, Mississippi on November 30, Dei ember 1. 
and December 2, 1994 (one additional member 
could only attend on December 2). On December 1 
and 2. those members of the External Review Panel 
present had the opportunity to interact with 
individuals f rom the following groups: Mississippi 
Department of Education Administrative Stal I . 
MDE Assessment Staff , the Superintendent's Task 
Force on Accountability and Learning, t ail riculuin 
specialists, and the High School Exit Assessment 
Implementation ( lommittee. In addition, we met 
with the Director of the Education Forum of 
Mississippi. During those two days we also received 
other printed materials related to the stale assess- 
ment programs. 

The other two members of the panel planned t< > 
attend the meetings in Jackson, but emergencies 
kept them f rom doing so. Nevertheless the\ read all 
the material 1 - sent them before and after the 
meetings and have read, reacted to. and agu e with 
this f inal report. 

Three other national experts have reviewed ibis 
panel's report for SERVE and their comments have 
been considered and basically followed in the I i.ial 
draf t of this report They areobv ionslv noi respon 
sible f< >r anv errors in the report, and this report 
should not be considered as ha ving been endorsed 
bv (hem (although it is our belief that the\ basi< all \ 
agree with this report). 

It is important to note that poi lionsof theoutliue 
and indeed much of thegeneral content of ibis 
report is patterned af ter Issues ami Rrnnnmt'Htlnlnun 
llrgarding Imfth'nienttitian «/ High Si htntl ( ImtluuliuH 
this, written bv William A. Melu cus foi the North 
Central Regional Educational I aboratoi v (Meh- 
i ens, W.A.. \ WA). That report, in turn, was pat lei tied 
af ter a report wi itten bv an expert panel, and 
chaired bv William A. Mehrens, for the Michigan 
Department of Education. We appreciate die 
consent of (he N( 1REL to use that niatei ial. 
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SECTION I: 

Review of Legislation and 
Policies Related to theFLE 

WE HAVE READ SEVERAL PORTIONS 
of Set (ion S7frwn a Code related to 
Statewide Testing Programs and some 
Department reports on the FEE (June, 1993) and the 
total Mississippi Statewide Testing Program (Sum- 
mary Report for 1994) that inl'or m us (although 
probably only partially) regarding relevant legisla- 
tion and policies related to the FEE. Willi respect to 
the Code, various portions of Section 37 seem 
appropriate legislation lor the FEE and indeed Cor 
the proposed MA AH A February 10. 1987 letter 
from [then | stale superintendent Boyd to the 
Honorable Jac k Cordon correctly pointed nut that. 
... "it is cumbersome and untimely to seek legisla- 
tive amendments for what are routine educational 
decisions." It appears that the legislators agreed 
with this position ami the Code is reasonably 
general. Because the plan for the M A AP is to assess 
the same three areas (reading, writing, ami niath- 
eniati' s). the code may be appropriate for the 
MAAPaltbough written for the FEE. However, in 
various sec -lions of the Code, the terms "basic skills" 
ami "f unctional literacy examination" (lower case, 
not as a name of an exam) arc used. It would 
probably be preferable if a new code were written 
with language more in ke eping with the proposed 
MAAP. Additionally, we recommend that someone 
from the Attorney General's Of l it e review sections 
37-l<i-9and 37-16-1 1 to see if they are worded 
appropriately given new federal legislation (such as 
the Americans with Disabilities Act |ADA]) and am 
new state legislation or legal precedents. 

In Nummary, we recommend that the code be 
reviewed to ensure that it is appropriate, based on 
tile proposed MAAP and the new federal legisla- 
tion. Furthermore, until more details are decided 
with respect to the MA AP.it is dif f icult todetei - 
inine jusl which policies or procedures should be 
decided by the Department (and confirmed bv the 
Board) and which mav need to he in legislative 
( tide. Thus, our first recommendation. 1 



Recommendation 1: 

Engage the services of someone from the attorney 
general's office regarding whether a new 
( revised) code needs to be written. The relation- 
ships with this individual should beongoingso, 
as more specific decisions are made about the 
MAAP, that individual can have input into the 
decision of whether board policies or Legisla- 
tion is preferable. 

With respect to specific polit ies regarding the FEE 
that have been formulated by the State Department 
of Education, those of which we are aware seem 
well conceptualized and have served the state well 
with respect to the FEE. In the section to follow, we 
will be making some specific suggestions with 
respec t to polit ies that should be adopted for the 
MAAE To the extent existing policies already cover 
our suggestions, they can be ignored. 



SECTION II: 

Issues and Recommendations 
Regarding the Proposed 
MAAP 

MANY ISSUES MUST BE CONSIDERED 
when implementing a high school 
graduation test. This section will address 
several of the more important ones, including 
curriculum/test specification, psychometric, 
educational, legal, polic; /administrative. and 
human/financial resource issues. Main of the 
issues are connected and the resolution of One mav 
af fect the others. 

In preparing this report, we were mindful of legal 
and professional guidelines that must be consid- 
ered when designing and implementing a requited 
high school graduation lest. Professional standards 
lot tests are artic ulated in Slftmlfirdsfor li<i\uulimml 
iinri Psyvhnlngiml 'lhling(\V,RA. APA.NCME. 1985). 
Many of the legal considerations have been ad- 
dressed in die case of Ifrlmil'v, rurlh>0mi{\ViK\ 
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1984), a broad-bused c hallenge to Florida's high 
school graduation test requirement. 



graduation tests should not sample a state's total 
curriculum for measurement, philosophical, and 
legal reasons. 



Curriculum/ Test 

Specification 

Issues 

OBVIOI SI.Y.ONF MUSI DKCIDE WHAT 
to test before beginning to construct the 
test. But the task is not a simple one. 
General decisions need to be made regarding what 
subject matters to test, but more specif ic decisions 
need to be made also, including what subareas to 
test in those subject matters and how many ques- 
tions should come f rom each of the subareas. These 
decisions are important lor educational, psycho- 
metric, and legal reasons. This section discusses and 
offei sieconnnendationson some of the more 
important issues. 

Specify Subject Matters 

The high school exit assessment implementation 
committee has recommended that "testing will be 
only in the areas of reading, math and written 
communication because these subjects measure 
some important basic high school competencies." 
We agree with this recommendation. To add 
additional areas < ould increase the costs and make 
time-lines more difficult to meet. However, the state 
may wish to add additional subject matter areas in 
the future.-' 



Recommendation 2: 

The state board should abide by the recommen- 
dation of the implementation committee to limit 
the ex it assessmen ts to the a reas of read ing, 
mathematics, and written communication. 
Additional areas may be added at a later date. 

Specify Content within Subjects 

Af ter deciding which subject areas to assess,< me 
must decide how those subject matters are to be 
del hied and which particular subparts to assess. In 
keeping w ith the terminology used in Mississippi, 
one must determine what standards and competen- 
cies should be assessed. Obviously, high sc hool 



A particularly troublesome problem in Mississippi 
is that the Curriculum Struc tures are in various 
stages of revision. For example, the Mississippi 
Mathematics Curriculum Structure has a 1995 date- 
on the outside cover and an October, 1994 date on 
the inside cover page. The official mandated 
reading and written communication skills cur- 
ricula were published in 1986, and a revision 
process is scheduled to begin in early 1995. As we 
understand the current plans, the English/Lan- 
guage Arts and Reading will be meshed into one 
curriculum? The revision is estimated to take 
about 18 months. Current plans(hopes) are that the 
curriculum structures will be completed in time lor 
training in the summer of 19%, the curriculum 
structures will be piloted in 199(5-1997 and w ill be 
implemented first during the 1997-1998 school 
year. 

Thus, schools are currently required to teach the 
1986 curricula in reading and written communica- 
tion skills, and they have just been mandated to 
teach the revised mathematics curriculum. 

A second problem has to do with the articulation of 
both the content and the timing of the old FI.F.and 
the proposed M A A P. (We will make subsequent 
recommendations following this articulation when 
we discuss the timing concerns.) The FI .F, still 
required for high school graduation and still being 
used for accountability purposes, covers content 
that the schools should feel obliged tot each. The 
FLF. covet s subparts of the 1986 curricula. Imple- 
menting a new MAAPthat covers dif ferent (oi 
perhaps just additional) competenc ies leaves 
schools in a quandary regarding which of the 
competencies should have higher priority in their 
curriculum. 

The proe ess used thus f ar in Mississippi to deter- 
mine the spec if ic standards and competencies that 
arc- to be- the basis of the M A AP is considerably less 
than exemplary and far from complete. In fact. one 
member ol the implementat ion committee has 
suggested that the pioccss has been conducted in 
somewhat ol a backwards fashion. 
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As we understand the process, die implementation 
committee divided into three groups and, with a 
couple hours of work, produced a set of standards 
and competencies in Mathematics. These basically 
came from the hi and new Mathematics Curricu- 
lum Structure. For the other two areas they were 
produced based on the committee members' 
knowledge of national movements in these two 
curricula!" areas. After these statements had been 
produced, they were put into a survey (the Aca- 
demic ( kmtetit Standards & ( kiinpetencies Ques- 
tionnaire) and mailed to school districts. For 
Reading and Language/Written Communication 
the districts were asked to respond yes or no to three 
questions for each Standard and Competency: 
Currently being taught, Should be part of curricu- 
lum, and Should be tested. Because there exists a 
revised curriculum in Mathematics and the stan- 
dards and competencies came f rom that, the 
districts were only asked whether they should be 
tested. (We assume that thi'fe is other documenta- 
tion regarding the responses of schools to the first 
t woquest ions.) The resultsof thesut vey are con- 
tained in a document entitled "Analysis of Results 
on the Academic Standards and ( lonipctencics 
Questionnaire Administered During September 
1994" dated October 21. 1994. In general the results 
suggested that "all of the standards and competen- 
cies in reading ate currently being taught, should 
be part of die statewide curriculum, and should be 
measured on the exit examination." (Percents for 
Yes responses ranged from 77% to 100% across all 
three questions for all standards and competen- 
cies.) For Language/ Written Communication, 
"most of the competencies are being taught and 
should be part of the curriculum. However, four 
standards and one competency received low 
percentages of 'yes' responses on the question 
whether they should be measured on the exit 
exam." For Mathematics, "the responses indicated 
that the reviewers felt that one standard and three 
competencies in the current curriculum should not 
be measured on the exit examination." 

We commend the gathering of the data on the 
( ompen nriesand standards. As we understand it. 
the standards and competencies for all three areas 
will be revised based on the responses totheques- 
l ions and the open-ended responses. 

In the deliberations regarding the rc\ isionsof the 
standards and competencies, the I ISF.AK '. needs to 



balance two competing interests. In his original 
charge to the task force, the Superintendent 
wanted the assessment program "to be designed in 
such a way that education will have no alternative 
but to change dramatically." This charge, coupled 
with the moral and legal needs for holding students 
accountable for learning only if they have had the 
opportunity to learn presents a clear tension 
between two existing forces. This tension is not 
unique to Mississippi. Around the country those 
interested in reforming education have suggested 
(correctly) that assessment can serve as a powerful 
catalyst for educational change. If schools and/or 
students are held accountable for certain standards 
and competencies, then those will be taught. The 
tension arises because it is morally reprehensible 
and legally impermissible not togi ant a high 
school diploma (a property right) to students 
because they have not learned material that has not 
been a part of their curriculum. In the Debra P.v. 
Turlington precedent, it was held that a student 
cannot be denied a hitdi school diploma unless it has 
been adetjuatelydemonsi vied that the student has hud 
un opportunity to learn the material on the test This 
legal precedent has been incorporated into the 
professional Standards for Educational and Psycho 
hjrital yj'.v //•/#( A ERA, A PA, NCMK, 1985): 

When a test is used to make decisions about 
student promotion or graduation, there should 
be evidence that the test covers only the specif ic 
or generalized knowledge, skills, and abilities 
that students have had the opportunity to learn 
(p. 41-42). 

Thus, tension exists bet ween wanting to use a high 
stakes assessment (for students) to reform educa- 
t ion and the desire to be fair to the students and to 
have a legally permissible assessment. The state 
must consider carefully the trade off between (a) 
using assessment of desirable but currently not- 
taught material as a catalyst f or curricula! change 
and (b) restricting exit assessment content to 
material that has been taught. While we recognize 
the tension, we lean toward the fair and legal side 
l ather than the catalyst for curriculum reform side 
of the debate.' Thus, we offer the follow ing recom- 
mendations. 

Recommendation 3: 

There should be another survey of the local 
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districts to determine the opportunity to learn 
the standards and competencies prior to the first 
pilot administration of the test, andagain at the 
time of the first real administration. We strongly 
recommend that both students and educators be 
surveyed. If theevidence from the opportunity to 
learn surveys suggests that the material to be on 
the tests has not been adequately covered in the 
curriculum, wc suggest the exit requirements for 
the assessmen ts be postponed. 

Recommendation 4: 

Once the specific standards and competencies 
are determined, this information should be 
widely publicized in the local school districts. 
This information should be disseminated in 
enough detail to make students, parents, and 
educators aware of the knowledge and skills to 
be tested without providing so much detail that 
the students can answer the questions without 
understanding the curriculum. 

Recommendation 5: 

If the assessment is to include any material not 
vurren tly ma nda ted by the state or ta ugh t in the 
schools, there should bea state board adminis- 
trative rule or statute which specif ies that the 
local districts must teach this material. 



Recommendation 6: 

Once the standards and competencies are 
determined, the state must provide assistance in 
the professional development to local teachers if 
there is a need? 

Additional 
Curriculum and 
Instructional 
Considerations 

WK \V< HID I.1KKTO K A ISK SKY KRAI, 
additional issues ( oucei liing Mississippi 
( m i iciilum st i lie t mcs in 1 'ticra) and 
die mat hematics cm ricuhun structure in particu- 



lar. The mathematics curriculum st incline is 
singled out because it is complete and because 
mat hematics will be one ol the tested areas at lower 
levels and on the new MAAP. 

Mississippi Curriculum Structures 

It is the understanding of die Exlii nal Review 
Panel that all ol the core subject areas eventually 
will have curriculum structures. The mathematics 
curriculum structure and its accompanying 
process guide as well as the curriculum structure 
lor social studies were given to the panel for review - . 
The science curriculum structure will soon be 
considered for state adoption by the State Board of 
Education according to the science specialist. The 
English/I .anguage Arts/Reading curriculum 
structure will be interdisciplinary in nature and 
will be developed over the next 18 months. In all 
cases it was reported that the curriculum structures 
attempted or will attempt to embody the current 
thinking of national professional organizations 
and documents. ( )bviously this is desirable in that 
national publications and conferences will address 
various issues dial are pertinent to the Mississippi 
situation. 

In several instances the term "f ramework" was used 
bv various pei sons interviewed as a synonym for 
curriculum structure in a particular subject ari a, 
l ite term "curriculum structure" appeared on both 
thi' mathematics and the social studies documents 
reviewed. Based on the composition and organiza- 
tion of other state and national frameworks. the 
Mississippi curriculum sti iicturesare more akin to 
curriculum guides. Curriculum f rameworks are 
broad in scope and do not provide teachers w ith 
spec if i( objectives for use in their classrooms. 
Ki aim works eommimii ate the spirit, not the 
spec if its. of the ntatlleiuat ics curriculum. In 
addition, frameworks address other issues sik has 
professional development, instructional materials 
adoption procedures, essential support svstems. en . 
The two Mississippi curriculum structures reviewed 
do provide information to the objective level and 
define the curriculum in fairly specific wax s. 
I lence the term curriculum st run lire is the more 
appropriate term to use when referring to the 
documents dial have been or arc being developed 
loguide instruction in Mississippi schools. 
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Recommendation 7: 

The current documents used to define the 
Mississippi curriculum in appropriate subject 
areas should not be referred to as frameworks 
because they are more specif ic than frameworks 
in the area of content but do not fully address 
other areas that frameworks generally do. 
Curriculum structure is the term that appears to 
be the choice of M ississippi educators for the 
docu men ts described a bove." 



Mathematics Curriculum Structure 

fte-cause the mathematics cm rieuluiii structure will 
he used lo define die- mathematics coiili-ni dial will 
la' tested on the MAAPas well as other me-asuresof 
achievement, further analysis will he made ol that 
doe umeui and its ( onipanion doc iinieni, the 
mathematics process guide. Both of these have 
recent Iv he-en c omplelcel and some le ache-i 01 ieuta- 
lion sessions held dm ing the Tail ol I'.I'.H, 

Six content strands have- hc-en idem j I ieel in the 
Misskxijifn \lrithf»iritii \(:in lii iilii >n Strut hi ivns 
strands that will he taught at ev en grade level. 
These areas follows: mini her sense numeration 
operations, patterns relations Tunc lions, algebra, 
measurement, geometry, and statistics probability. 
( )hjec lives are identified for each strand for each 
grade lev el K-8ancl lor spec if ic courses in the- upper 
grades curriculum. I'realgehi a c an he- taught as 
carlv as grade ".and algebra can he taught as carlv 
as grade K. The ohjec lives for each strand seem to 
l it adequately iutoihe grades K-8 c in ric ulum since 
each strand will like-lv he addressed at each o( these 
grades. Some of the ohjec lives in some of the upper 
level courses seem "forced." While il is true that 
these six strands will he addressed in the secondary 
c lin ic ulum. il mav not betruelhai each of the 
strands will appropriately fit into eac h course of 
the secondary cm ric ulum. l or example, the 
statistic s probability strand seems lo he one dial 
does not | it well in eac h o| the com ses. Specif ic ally 
for i.ilcuhis.thcohjec lives under the statistic s 
prohahilitv m rand (p. 7~)aie as follows: solve 
i elated i ale piohleins, solve opli mi/at ion prohlems. 
and use integral ion lo solve real life problems, The 
cornice lion of I he ohjec lives with I he strand is not 
i caclilv appai en 1.1 lenc e, il mav not he u ue ih.il 
even si i and vs ill con veil inn fv l it intocvci v c oiii sc. 



Recommendation 8: 

We suggest that the mathematics curriculum 
structure be reexamined in light of the above 
issues. Each strand may not fit well into every 
course at the secondary level and need not be 
forced to do so. Hence, as tests are constructed for 
various purposes, this issue should be recog- 
nized. 



The seven process si rands idem if ice I in The Missis 
si/ipi Mfithrmatii.s l'mcr.s\(iuitl?MVii& f ollows: 
problem solving, com mimic at ing, reasoning, 
connec ling, estimating, using technology, and 
assessing. I (The content strands identified earlier 
were written horizontally across the topof a grid 
and the- process strands were written down the- left 
side of the same grid, each cell of the grid would 
represent the interaction ofa c ontent strand and a 
process strand. In general, this is the kind of inf or- 
mation that the- process guide- provides for eac h of 
grades K-H. Kadi cell would contain an activity that 
indie ates how a rout en I strand and a process strand 
might interac t in a desirable- fashion. For 
IVcalge-bra, Alge bra I, ( le-onicti \. Algebra II, and 
Trigonometry, ac tiv ities are- desc ribe -el which 
generally reflect combinations of 'several of the 
process strands to some- topics in those courses. 

Il appears thai iheroutcul selected (through die 
activities) lor interaction with the process strands is 
not alwavsthal delineated in theohjec tjvesof die 
uiatheniatic sc in ric iiluin structure. In general the 
link between die- process strands of I lie process 
guide and thee c mi em objectives of the c tirrie ulum 
si i ue I ure are not always explicit. We believe Inat 
ibis is die crucial link about whic h teachers are 
seeking help. I low well teachers understand this 
relationship will del ei ni ine how the content is 
presented in classic ic mis. 



Recommendation 9: 

It would be desirable to have a (loser correlation 
between the objectives of the curriculum struc- 
ture and one or more of the process guide. 
Teachers could then better understand the 
what" and the "how" of their curriculum. 
Because both documents have already been 
published, some emphasis should begiven to this 
connection between process and content strands 
in staff development sessions at all levels, The 
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degree to which all of the utricular and assess- 
men t emphases correla te to one another at all 
levels should be considered. 



Of all the tests, only the MAAP (eventually) ran 
deny a student a diploma. However, all of the other 
assessments should he providing "help" for stu- 
dents to do well on the MAAP (see recommenda- 
tion iY2). 

The Functional Literacy 
Examination (FLE) 

Obviously since the composite passing nit i* of the 
current FI.E is about 94%, the material being tested 
has been taught by teachers and is being compre- 
hended by almost all students. The new MAAP will 
likely assess more advanced content when it is 
ready. I lowever, in the interim, teachers are being 
asked to adjust their curriculum and their teaching 
to correspond to the mathematics curriculum 
structure and the mathematics process guide, but 
passing the FI.K will remain a graduation require- 
ment. Some attention should be given to how the 
current FI.K content will be incorporated into the 
new initiatives so that students will continue to 
perform well, even while addressing the new 
curriculum structures, until the new MAAP is 
ready. 

Psychometric 
Issues 

A 1 .1. PARTIC IPAN IS IN THE TEST C .( )N- 
struction, administration. scoring. and 
reporting process should be aware of the 
Sttt rid ft rdsfti r Kdumtimml ft nil I'syctwfogicttl Tcs I i 
mentioned earlier. Thisscetiou is divided intosub- 
sections on validity, item development, mix of item 
formats, f ield testing, scoring, standard setting, 
item sensitivity reviews and btasst udies, reliabilit v. 
scaling/reporting, munbcrof forms, equating, and 
standardization of administrations. 



that are made from the scores (AERA. APA, 
NCME,1985, p. 9). 

Although validity is a unitary concept, evidence of 
validity may be accumulated in many ways. Tradi- 
tionally, such evidence has been categorized as 
content, criterion-related, and construct validity 
evidence. Different inferences that may be drawn 
from a test score demand dif f erent types of validity 
evidence. It is important not to make insupportable 
inferences from the scores. The test name itself 
mav lead to an insupportable inf erence. For 
example, calling a test a "Functional I .iteracy 
Examination" as has been done for the previous 
exit examination woidd support the inference that 
a person who f ailed the test was illiterate. Thus, the 
name should be chosen with rare. Wrsnpfmrl the 
suggestion made by the I ISEA /( '■ to mil the test the 
Mississippi Assessment of 'Academic Proficiency 

In addition to what a test is called, it is important 
(hat public officials do not suggest in their writings 
or speaking that inferences can be drawn from the 
assessment which are not supportable. For ex- 
ample, in our meetings on Dec ember 1 and 2 we 
heard one person suggest that the assessment 
would ensure that if students passed they "would be 
able to be successful in the real world," Such a 
statement implies that there is some evidence of 
predictive validity. If no predictive validity exists, 
such infercneescan not legitimately be drawn. 
Thus, the following recommendations. 



Recommendation 10: 

Every effort should be made to caution Depart- 
ment of Education employees, the State Board of 
Education, and other spokespersons against 
making any unsubstantiated staements about 
what the assessment measures or what inferences 
can be made from the assessment scores. An 
official statement shoul be made regarding the 
assessment and the inferences that can be drawn 
from the scores. There should be either good 
logical reasons or empirical evidence for the 
inferences that are to be drawn. 
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Validity 

Validity is the most important consideration in 
test evaluation...! and 1 retei sin the degree to 
which that evidence supports the inferences 
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Recommendation 11: 

Professional development activities related to 
the new test should include discussions about 
valid statements that may be madeabout the test 
results. 
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Exercise ( test item ) Developmen t 

// the developed exercises are faulty, the assessment is 
inadequate. One of the most important aspects of ;i 
good assessment is that it, indeed, measures the 
standards and competencies (hereafter just called 
content) that have been listed in the publications 
describing the assessment content. One of the 
major Standards to be considered is as f ollows: 

When a test is to he used to certify the success- 
ful completion of a given level of 
education...both the test domain and the 
instructional domain at the given level of 
education should be described in suf ficient 
detail, w ithout compromising test security, so 
that the agreement between the test domain 
and the content domain can be evaluated 
( VERA, A FA, NOME, 198"), p, 52). 

This evaluation should not be left for the test's 
critics to make after the test has been given. This 
evaluation needs to be made at the time an assess- 
ment is chosen or developed. Ensuring the test / 
curriculum match and communicating the test 
domain toothers is likely to be more dif f icult if the 
curriculum structures stress quite broad, general 
competencies. For example, one proposed curricu- 
lum structure (science) will list competencies as well 
as sample objectives for the local districts to cover. A 
problem is that the test will probably assess at the 
objective level, and some distrii tscould be instruct- 
ing to objectives that match the broad competen- 
cies on the curriculum structures but not the 
specific objectives assessed on the test. 

In ensuring a match between what the assessment 
measures and the publicized content, it i*. unlikely 
that any "of f the shelf existing assessment package 
would be adequate. Most likely, an assessment 
package will need to be specif ically built to match 
tin- Mississippi standards and competencies. 



Recommendation 12: 

Plan on constructing an assessment to be used 
specifically for the graduation requirement. Be 
very skeptical of any contractor who suggests an 
off the shelf test will adequately meet the re- 
quirements of a Mississippi High School exit 
examination. 



In developing the assessment, there must be several 
steps taken to ensure an adequate match between 
content and test specifications. 



Recommendation 13: 

Demand that the contractor design sufficient 
safeguards to ensure that the assessment ad- 
equately samples the defined content. 



Items ran be f aulty for a variety of reasons. If the 
original items are f aulty, either because they do not 
match the defined content or for other reasons, it is 
difficult to "fix" the test at the field test stage of 
development. Any item substantially revised 
following a field test should be subjected to another 
field test. Thus, it is important to have well-trained 
item writers. 



Recommendation 14: 

A ny request for proposal (RFP) for item/test 
development must be written to elicit sufficient 
information from the prospective contractors so 
that the bid will not be awarded to an incompe- 
tent contractor. The department will need to 
audit closely the work of the contractor to ensure 
adequate item development, tryouts, revisions, 
etc. It is critically important to have an instate 
content area reviewteam composed of teachers, 
curriculum supervisors and university curricu- 
lum specialists determine the quality of the item 
specifications and the items and recommend 
appropriate revision to the contractor! 1 



liecattse the FEE has been given in the same three 
subject matter areas called for in MAAP, it may hi' 
possible that some of the items used in that exam 
can be used (or revised) for the MA AH (liven the 
expense of item development, it would be wasteful 
to dist ai d items if thev match the new standards 
and competencies. 



Recommendation 15: 

Both the department and an instate content 
review team should review the items currently 
being used on the FLE to determine whether any 
of them match the new standards and competen- 
cies. If they a re of sufficiently good quality, 
consider using those items on the MAAP." 
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Mix of Item Formats 

An issue that should be considered early is the mix 
of item formats. Some critics of multiple-choice 
items suggest incorrectly that such a format cannot 
tap into higher-order thinking skills. The rhetoric 
of such critics suggests that multiple-choice items 
can only measure basic, isolated bits of recall. Such 
is clearly not the case and we hope that those 
designing the assessment do not harbor such faulty 
beliefs. While multiple-choice items can measure 
higher-order thinking and problem-solving skills it 
is certainly true that multiple-choice items cannot 
measure all possible outcomes. However, good item 
writers are able to write appropriate (e.g., tapping 
objectives beyond factual recall) multiple-choice 
items for mathematics, reading, and some portions 
of language arts c ui i icula. (Writing should prob- 
ably be assessed by asking students to write.) 

Those who specif y the proportion of items from 
different formats should be informed by measure- 
ment experts regarding which competencies can 
be assessed bv which types of formats. State depart- 
ment officials need to recognize at theoutset that it 
will be expensive, both in terms of time and money, 
to gather performance assessments on every high 
school graduate. Wainer and Thissen ( 1 993) have 
found in their study of the Advanced Placement 
Chemistry Test that a 7f> minute multiple-choice 
test in chemistry is as reliable as a 185 minute 
constructed response test. Because of scoring cost 
dif ferences, the relative diff erence in costs for a 
given level of reliability is truly staggering. Wainet 
and Thissen (1993) estimated that if one, for ex- 
ample, wanted a test with a reliability of 0.92, it 
would cost 3000 times as much for a constructed 
response test as for a multiple-choice one. 

Not onlv are there reliability and cost problems 
associated with performance assessments, but there 
are a mvriad of otlu r problems in areas such as 
validity, standard setting, and equating. Those non- 
measurement educators pushing for performance 
assessments should attempt to become educated 
with respect to these measurement problems. A 
recent survey of alternative assessments (Wolcott & 
I lof f man (1994) has concluded that "attaching high 
stakes io portfolio and performance assessment 
seems preniat ure at this point" (p. vi). ( lei tainly, if 
performance assessment (constructed response) 
exercises are used, there are a number of additional 
considerations regarding such issues as scoring, 
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scaling, equating, and reporting that are addressed 
in later sections of this report. 



Recommendation 16: 
The depa rtment should provide, possibly 
through the technical advisory committee, 
statewide staff development for educators to 
increase awareness of measurement issues as well 
as the high cost associated with performance. 



Recommendation 17: 

Unless there is a compelling non-measurement 
reason, do not use the constructed response item 
format for competencies that can be assessed via 
multiple-choice items. Do not use any portfolio 
assessments (one type of performance assessment) 
for theMAAP? 



Finally, the State Department of Education must 
make a decision regarding how many items to 
develop initially. While this decision is related to 
other decisions (such as how many times a year to 
test, whether any given form can be reused, and 
whether anchor items are used f or equating 
purposes), two general recommendations can be 
made. 



Recommendation 18: 

Contract for enough items initially so that after 
losses through pilot and field testing sufficient 
items will remain to build forms through the 
second administration year. There should be 
alonger range plan to develop a complete bank 
of items. 



Recommedation 19: 

Reissue a contract in sufficient time to have 
items developed and tried out (possibly embrd- 
dedin a live form) prior to their being needed 
for the third year. 



Pilot (field) Testing 

We are very positively impressed with the ie< om- 
mendation of the High School Exit Assessment 
Implementation ( lommittee that there be two vears 
of pilot work on the new MA AP. 1 lowever, as we 
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understand their current proposal for the first 
pilot, it would he to evaluate a variety of pilot 
assessment instruments from dif ferent vendors. 
While vendors will likely hid to produce a pilot, we 
are not convinced that this is the best way to pro- 
ceed. One would have to write an RFP for the pilots, 
fund all those vendors who produce reasonable 
responses, test them all, go through some process to 
pick the col l ect vendor, etc. It seems preferable 
(certainly more efficient) to place an RFP that is as 
specific as possible for the "real" test and then 
Mississippi simply should clu >ose the best response 
to that RFP This would allow more focused atten- 
tion from the very beginning on the test that will 
actually be used. Thus, the following recommenda- 
tion. 



Recommendation 20: 

Issue one RFP for the development of the actual 
MAAP. Do not issue a separate RFP inviting 
vendors to build pilot tests from which you will 
choose "the best". 



Another point that should be considered is 
whethei ' some of the items for the new MAAP can 
he pilot tested through being embedded into the 
FEE. There are both positive and negative aspects 
to such a procedure. One positive aspect is that it 
should be cheaper. Another is thai the students 
who take pilot items embedded into real tests will 
he motivated to try their best. A possible negative is 
that the items may be so different f rom those in the 
FI ,F that it will be apparent that they are pilot items. 
A second negative is that harder items on untaught 
content embedded in the FI.F will negativelv 
impact the morale (and therefore the performance) 
of the students who must pass the FI ,F. At any rate, 
some consideration should be given to piloting 
items within the FI ,F. The External Review Panel 
does mm/ have a consensus recommendation regard- 
ing the wisdom of this. I lowever the pilot testing is 
done, it isessential that enough itemssurvive the 
pilot tests so that there will remain enough items 
for two years worth of actual forms. Depending on 
the frequency of test administration, this would 
mean enough items for f our or six forms of the 
tests. 

With respec t to the proposed timeline, the I WEAK : 
has suggested the f irst pilot be in (he spring of 190") 



for all eligible students in grade 10, the second pilot 
to be in the f all of 195)5 to obtain data for standard 
setting, and the first "real" administration to be to 
10th graders in the fall of I99fi, We have some 
concerns about this. First, we believe that it is vei v 
optimistic to expect the first pilot to be readv by the 
spring of 1995. Secondly, it would be preferable to 
have the pilot administered at the same time of 
year as the first real assessment will be adminis- 
tered. Thirdly, we think it would be nearly impos- 
sible to have a quality first pilot ready by the fall of 
1995. Finally, as mentioned in the section on content 
specif ications, we are concerned that the standards 
and competencies to be measured are not vet 
determined and that the process for developing the 
new curriculum structures for Reading and Lan- 
guage/Communication has hardly begun and is 
not scheduled for implementation until the 1997- 
1998 school year. 



Recommendation 21: 

Be flexible on the time lines. We believe it would 
be preferable to delay the first pilot until the fall 
of 1996, the second pilot in the fall of 1997, and 
the first real assessment in the fall of 1998. This 
means that the first graduating class affected 
would be the class of 200V 



Following the pilot testing, the results should be 
disaggregated by appropriate demographic 
characteristics (e.g., gender, ethnicity, geographic- 
location, students with limited English proficient v, 
students with disabilities, etc.). The results should 
be studied carefully for possible bias, quality of the 
items, etc. Careful consideration should be given to 
how widely the results of the pilot should be shared. 
If there are two pilots (as suggested by the imple- 
mentation committee and as we strongly support) 
and the second pilot seems lo be pretty much like 
the final assessment will be, we believe the results 
should be shared very thoroughly. This second 
pilot should ideally be viewed as a trial run of the 
actual implementation, rather than a repiloling of 
the items. This is a great opportunity to address all 
problem areas and program logistics. This would be 
an ideal time to try out some report forms to see if 
they are intelligible (user f riendly). Further, be- 
( ause this assessment will rover more challenging 
content than the FEE it replaces, it is important thai 
districts have an nth ant e awareness of what will 
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likely be lower scores and, depending on the cut 
st ore, likely a higher f ailure rate, than they are used 
to seeing f rom the FLE. 

Recommendation 22: 

Consider what detail of reporting should follow 
the pilot tests. We believe the first pilot should be 
viewed as a combina tion research and develop- 
ment effort and results need not be widely 
shared. The results of the second pilot should be 
widely shared. 

Recommendation 23: 

Develop procedures and decision rules regard- 
ing which items in the pilots are OK, which need 
to be revised, and wh ich need to be discarded. 
Develop documentation procedures regarding 

these decisions. 



Scoring 

The scoring of the objective portions of the exami- 
nation should he contracted to a national scoring 
service. ( ommercial contractors have a great deal 
of experience and are well-equipped to do this 
scoring accurately and "I fit iently. 

There are legitimate arguments both for having 
the performance assessments (constructed re- 
sponse items) scored by in-state teachers and for 
having them scored by an outside-the-state con- 
tractor. Reviewers of this document have made 
arguments on both sides of this issue. An argument 
for in-state teacher scoring is that teachers often 
enjoy and learn much from the scoring process. 
I sing in-state teachers to score the papers may 
eithei add to or subtract from then edibility of the 
process, depending in part upon the quality of '(he 
training and monitoring process. At any rate, 
teachers should not be scoring papers f rom their 
own or sin rounding districts as tlicv could be aware 
of the identity -of (lie students whose papers are 
being scored. If in-state teachers are to be used. the 
scori ng sessions should he conducted by a storing 
com rue tot with the clear understanding that 
teachers who ( annul s< oie reliably oi validly will he 
dismissed. 



Strong arguments can be made for using out-of - 
state personnel to score subjective tests. The major 
ones are timely scoring and costs. One state costed 
out the scoring of writing and found that using 
classroom teachers was the more expensive option. 
An "army" of teachers must leave their classrooms 
for at least four to six weeks two or three times a 
year (depending on how often the assessment is 
administered). These individuals must be paid their 
regular rates and substit utes must be provided. 
More important, however, their teaching expertise- 
is lost during this time. Their students will never 
have the benef it of that lost instruction. Ways can 
be found to involve some in-state teachers without 
the disadvantages— for example, by using teams of 
teachers to observe the scoring process and using 
committees of teachers to assist in making policy 
decisions about scoring. We believe this pref erable 
to actual in-state scoring. The scoring of high- 
stakes assessments in a reliable and valid manner is 
f ar more important than whatever staff develop- 
ment or public relations value there might be in 
using in-state scoring. 

One could, of course, use both in-state and out-of - 
state scoring and compare the results. The state 
should consider ■carefully the alternatives with respect to 
validity,credibility n\ I results, costs, and ability to receive 
timely scores. We offer the following recommenda- 
tions. 



Recommendation 24: 

Con tract for professional scoring. 

Recommendation 25: 

Develop a professional development packet to be 
used with teachers based upon the results of each 
year's scoring. 

Standard Setting 

When using a cut store on a test to determine 
whether individuals pass or fail, "the cut score 
becomes the linchpin in thedet ision process" 
(A ERA. A PA, N( MY., 1985. p. 50). Yet. standard- 
setting is a subjective pr< it ess, and fvpit allv (here is 
dissonance between where policy makers think the 
cut store should be and the implication ol that cut 
store lot the failure rate (i.e . policymakers would 
tvpicallv think the t ut store should be reasonably 
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high until they disc-ova that such a c ut score 
produc es a "high" f ailure rate). 

Much professional literature existsou the method- 
ology for standard setting. In general, this measure- 
ment literal ure supports the f ollowing points: (1) A 
trained standard-setting commit tee should be 
involved in making recommendations regarding 
the standard. (2) This committee should use an 
iterative process that includes information about 
the f ailure rate by major ethnic groups (and per- 
haps other special populations), (3) The impact data 
should be obtained from the first administration, not the 
pilot test. (More needs to be said about this point. 
While the measurement literature would agree 
with this, and the panel members agree with this 
point when weal ing their measurement "hats," 
there is a legitimate (non-ineasureinenf)argnmcni 
on t hi* other side. As one panel member has 
pointed out, there is a quest inn regarding the 
practicality and tiinelinessof setting the standard 
aflerihv f irst administration. From a practical 
standpoint, this puts the Department and State 
Board in the difficult position of administering a 
high stakes test and being unable to tell parents and 
students tin- score thev w ill need to make to pass. 
Also at issue is how fast the students will need to 
receive the scores. The "wheels of policy-making" 
move slow ly so thai having the standard-setting 
committee convene af ter the test data are back and 
then making a recommendation to the Board could 
result in a considerable time lag. Thus, there may be 
considerable pressure (o set the cut score before the 
first administration, and some states do thai. 
I lowever, other states do wail until real data are in, 
and measurement experts clearly prefer waiting.) 
(4) The recommendations f rom the standard- 
setting commit lee, a description of the process thev 
used, a disc ussion of the relative costs of false 
positivesand false negatives, and the f ad that 
scores will go up across time should be taken to the 
go mp officially responsible for setting the stan- 
dard, and this group should make the final decision 
regarding where to set thee ut score. On high-slakes 
tests where the content of the tests isal a reasonably 
high level (as on the M A AP), it would generally be 
considered inappropriate In simply set the stan- 
dard at 7()'y — as we believe was done on the FI.F. 
I lowever, the philosophical concept of allowing lor 
a lower score on one test to be partiallv compen- 
sated for by a higher score on another tesi— as is 
being clone on the Fl ,F— inappropriate f mm a 



technical point of view if it is philosophically/ 
educ ationally acceptable, I< one does employ a 
part ial compensatory mc >del, the members of the 
various standard setting committees should be 
aware of this and they should be trained accord- 
ingly." We do not give specific recommendations 
about the training process, but rather, recommend 
further advice about this cliff icult problem. 

The following broad recommendations are made 
regarding standard setting. 

Recommendation 26: 

Reconsider the current plan to set the standard 
using ths pilot study results as impact data. 
While this may be preferable from some practi- 
cal points of view, it is not the approach pre- 
ferred by measurement experts. 

Recommenda Hon 27: 

Appoint and train a standard-setting commit- 
tee. This committee should becomposed of 
individuals who are both qualified and credible. 
A majority of the committee probably should be 
M ississippi public school educa tors with 
knowledge and experience both in the subject 
matter being assessed and at the grade level of 
the students being assessed. 12 

Recommendation 28: 

Use a technical advisory committee to help 

develop a specific standard setting procedure.' ' 

Recommendation 29: 

The State Board of Education should e- ablish a 
passingscore through administrative rule based 
upon a recommendation by the superintendent 
of public instruction with the advice of appro- 
priate committees. 

Setting standards for performanc e assessments has 
been considerably less researched than setting 
standards for inulliple-c hoic e tests. One could, of 
course, set separate standards for the two item 
formats w ithin a subject matter and use a conjunc- 
tive model for making the pass decisions 'i.e., one 
would have to pass both "subtests" w ithin the 
subject mailer). Selling separate standards for the 
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two formats is not recommended by us for a variety 
of psychometric reasons. In the Hist plac e, it is 
unlikely that the performance assessment portion 
of the assessment would be long enough for one to 
plat e any confidence in either its reliability or 
v alidity as a stand-alone assessment. Second, such a 
process would make even more difficult the equat- 
ing problems. Nevertheless, combining both types 
< )f formats poses formidable problems. Given the 
state of the art. it is reasonable to suggest the 
following recommendation. 

Recommendation 30: 

Engage in several small scale pilot study ap- 
proaches to setting standards on assessments 
composed of multiple-choice and performance 
assessments. Do statistical analyses regarding 
the impact of these approaches. 

Because the mil ial failure rate probably will be 
greater than the failure rale after the test has been 
in place for several vears. it may be reasonable to set 
incremental cut scores over time. This allows the 
( lit score to beset so that an inordinate number of 
students do not fail at the beginning, but the state is 
not locked into a c ut score that is lower than desir- 
able. T he advantage of setting these incremental 
ml scores al the beginning is that it may be easier to 
do than lo reset the cut scores later. 



Recommendation 31: 

Consider setting incremental cut scores for 
different graduating classes when theState 
Board of Education makes its initial decision. 

Another issue regarding standard setting is what 
standard should be set for accredit at inn purposes. 
We- address the issue- of using various pieces of test 
data for accreditation in more detail later in the 
report. I lowcvcr, it seems worth mentioning here 
that if one wants schools to strive to get all students 
above (he cut score- so that iliev c an graduate-, any 
an 'reditu! ion use rule-should not fore c districts to 
c hoose between working for a< < reditation and 
working for a high (e.g.. KMC '< ) pass rale lot the 
students. 



Item Sensitivity Reviews and 
Empirical Bias Studies 

AH assessments should be designed to be free of 
ethnic, cultural, and gender "bias." There are well- 
developed methods to eliminate sue h bias. The- first 
is in the training of the item writers. They should 
he trained to avoid certain stereotypical words and 
phrases that may be offensive or may give an unfair 
advantage to a part ieular et hnic . cult ural. < >r 
gender group. (Another group that should be 
considered is the Vocational Technical students. 
One individual we interviewed was concerned 
about the fairne ss of writing prompts for those 
individuals. This is a legitimate concern and should 
be kept in mind. It would also be appropriate to 
have the item writers keep in mind other special 
populations such as those who have certain disabili- 
ties.) 

A second procedure is to have all items re v ie-weel by 
a committee of individuals spec if ic ally trained to 
detect items that may show such insensitiv ity. 'f he- 
item sensitivity rev iew team c an be trained to focus 
on a v ariety of clif f eren i groups sue h as t host- 
discussed in the previous paragraph. 

A third procedure- istocompute "dif ferential ilem 
functioning" statistics on all of the it ems based on a 
pilot study (f ield trvout). Due to the- numbers of 
individuals that exist in the dif ferent groups, these 
statistic al procedures can probably be done- only on 
major groups such as both ge-ntlersand the pre- 
dominant ethnic groups. Those items that ace 
"flagged" by such a statistical analysis should then 
be- brought hack to the ite-m sensitivity review 
committee— and probably tothe- rcT.-vant subject 
matter content committee — for a f inal determina- 
tion of whether those items should be removed 
f rom the item hank. A fourth proc edure is loc i tiled 
committee meinbei s" judgments on whether 01 not 
the- te-si as a whole is relatively- free ( 1 bias. 

It should be- pointed out thai when a lest is< (im- 
posed of ite-ms wit It dif f eieiil f ormal s and those 
items mav c an y dif ferent "weights" the task of 
scaling the test (see- below)— which is a prerequisite 
ro empirical differential item functioning proe e- 
d urrs— is a bit more intricate, and pre ispee live 
com rat If ii s should i esponel loan RIP with details 
lcgarding how ihev will pro< e-e-d with such a lask. 
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It is important to note that while the test should he 
f ree f rom "bias," this does not mean that all ethnic, 
cultural, and gender subgroups should necessarily 
have the same mean level of performance. If some 
groups truly have not achieved as many of the skills 
in one of the subject matter areas (or indeed on a 
particular item), the test (item) should reflect that 
true state of affairs, Based on the f hidings f rom 
manv previous assessments, the Mississippi Depart- 
ment of Education should anticipate that not all 
subgroups are achieving at the same level and that 
the test scores will show those dif Terences. The 
purpose of the item sensitivity reviews and the 
dif ferential item f unctioning studies is to gather 
data 10 allow f or informed judgments about 
whether the individual items and/or the test items 
collectively contain ir relevant content that results 
in unfairness to a subgroup. 

Recommendation 32: 

The item sensitivity reviews should be completed 
by a committee that is selected and trained 
specifically for thts task.' 4 Most members should 
represent the state's predominant minority 
groups. However, it would be wise to include at 
least one member of the committee who is a 
minority group member from out-of-state and a 
recognized expert in this area. 

Recommendation 33: 

Conduct statistical differntial item functioning 
differentially for different groups should be 
flagged and reviewed (but not necessarily 
discarded) by an item sensitivity review commit- 
tee (conceivably-but not necessarily-the commit- 
tee used for the item sensitivity review) and a 
content review committee. Clear guidlines 
should be developed regarding how to respond to 
flagged items, how to handle committee mem- 
bers' disagreements, etc. 



Reliability 

Reliability pertains to the amount of test vai iaue e 
thai is due lo random ei rot. Data should have high 
icTiahilit v. There is , in ienil\ some debate about 
just bow reliability should best be< alt ulated foi 
pel loi 'mailt <• assessments. The stale ny,\\ want to 
obtain s|«'( il ii .id v i< e Iiom a tes hniral ads isoi \ 



committee on the best way to combine perfor- 
mance uud multiple-choice assessments and 
whether toobtain reliability through a battery 
reliability formula or some other approach. While 
those responsible for monitoring the quality of the 
assessment should study various approaches and 
ask the contractor in an RFP to provide specif ic 
recommendations, we of fertile following recom- 
mendation. 



Recommendation 34: 

Obtain the following reliability estimates; 

internal consistency, interrater reliability* 5 , 

generalizability across performance samples, 

and the reliability or standard error at thecut 

score. 



Sea I ing/Reporti ng 

Once tests have been scored, the students' results 
must be repot ted. Generally, it is not considered 
wise to r eport the "raw scores" (e.g., number ol 
items right on a test). The scores are typically 
reported based on some mathematical I ransf or ma- 
l ion of t he raw scor es so that the transfor med scores 
have certain statistical properties (e.g., a specific 
mean and standard deviation). When multiple- 
choice items and per formance assessment items sire 
combined into the same test and one wants a single 
score, there are difficult decisions to be made 
regarding how to< otnbine the twosetsof items. 
The easiest approach ( w;/ necessarily the best) is to 
determine in advance how many raw score points 
lo assign eac h level of performance on the perfor- 
mance assessment items and to simply acid these 
points to the- number ol mult iple-c hoie e items an 
individual ge ts cor rec t. A second approach would 
be to score the two sets of items separately and then 
combine those t wo scores through some n priuri 
weighting scheme. That scheme could be- based on 
a logical, philosophical weighting or an empirical 
weighting based on anv number of different 
variables such as their separate reliability estimates, 
subtest information func lions, etc . A third ap- 
proach would be to use an I RT model thai scores all 
the it cms together There, the t\ pic al choices arc a 
one-pa i auieler or a iiuiltiple-paianieiei model. 
A not her approach is not ten oini.ine the two t\ pes 
of assessments at all. but to scale then i scpai ateh 
and lo set I wosep.u ale c in scenes. I low I be e oni bill- 
ing gels clone (oi xnIiciIici these oiesaic kept 
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sepal ale) has relevant e lor reporting ami for 
equating approac lies considered later. How the 
weighting/combining gets done also could be 
extremely important to an individual student. In 
addition, a particular combination method may be 
more beneficial to some subgroup (e.g., one gender 
or some ethnic group). 

Recommendation 35: 

Consider carefully how the performance assess- 
ments and multiple-choice items are to be 
combined. There should be expert advice regard- 
ing this and empirical studies showing the 
differential impacts of various approacheson 
individual students, groups of students, ease and 
quality of equating procedures, etc. 

BiTati.se high school graduation tests are not 
tvpic allv designed to dif ferentiate among those 
passing, and because one should not encourage use 
of information on the difference in students' scores 
above the cut score (e.g.. foi employment decisions 
idler graduation or for district accreditation 
decisions), one would typically report scores above 
the cut score ( >nl\ as a "pass." 

Other quest ions arise for those who do not pass. 
Educators typically want high school gradual ion 
tests to be diagnostic. They believe that failing 
students should be given some in lot mat ion that 
would facilitate efficient and effective remediation 
efforts. Thiit is understandable, but it is dif ficult to 
design a test that isof high quali y both for deter- 
mining accurately who deserves to pass and lor 
determining just what the specific diagnostic 
recommendations should be for individuals who 
fail. Thus the dilemma. Reporting sub-test scores 
mav implv more diagnostic information than can 
be justified based on such technical considerations 
as the reliability of the difference scores. I lowcvcr. 
not to report sub-test st ores limits (he usefulness of 
the scores for remediation. Because reporting sub- 
test scores is a multifile cted and technical issue, it 
deserves careful attention, 

If the decision is mad" to report sub-test sc ores (as 
we suspect it will be), it w ill have implications lor 
die test specifications and test development. 
Troublesome problems mav arise if die assessments 
make use ol a variety of item I vpes. I lie subst ales 



composed ol primarily performance assessment 
exercises are not likely to be as equivalent across 
vears as the subst ales composed of multiple-clinic e 
items. This may have implications for bow to 
communicate the subscales. Further, the item 
weighting of the constructed rc.s|x>usc and mul- 
tiple-choice items will impact the decisions legal cl- 
ing subscale reporting. 

Careful thought should be given to what has been 
learned regarding how the subscores reported loi 
the FI.Khave been interpreted and used. I'ei soimel 
from the State Department of Educ ation should 
make concerted efforts to determine the ac c ept- 
ability (Ixnh from a public acceptability point of 
view and from a psychometric point of view )<>l die 
current appmai h and deiei inine vvhethei e hauges 
need to be made for the MAAP. 

Hie issucof which transformed scores (sc ale el 
scores) to use for reporting is also a dil l ic nil lec hni- 
c al issueihat cannot be solved in the abstract. 
Numerous scores could be used. I sing the -ainc 
sealed scores ac loss subject matters does have some 
advantages. and we would recommend ii. I lowevei. 
using a common scale ac ross subject areas ma\ I i.i\e 
implications for lest development. 

Recommendation 36: 

Scores should be reported as pass or fail. Those 
individuals who fail should begiven some 
information regarding how close they wen to 
passing, and they should be given some d iagnos- 
tic information that would facilitate remedia- 
tion efforts. Important technical details (e.g.. 
reliability of difference scores) regard ing 
various methods of reporting diagnostic infor- 
mation should be worked out and specific plans 
should be formulated by a technical advisory 
committee prior to approval of the final test 
specifications. 

Recommendation 17: 

Use a common scale across subject matter areas. 
This takes some advance planning to avoid 
adopting a scale that is appropriate for one test 
but unworkable for another. 



36 



40 



The Full Re port 



Recotrmendation 38: 

Consider whether it would be butter to keep the 
same scaled score approach as is being used on 
the FLE or whether it might be better to change 
the score to avoid confusing the two. 1 * 

11 the assessment were to be norm-referenced, 
there should be a reporting of an individual's 
norm-referenced score. Although we realize that 
there has been some discussion in Mississippi of the 
MAAP being norm-referenced as well as criterion- 
referenced, we see extremely troublesome mea- 
surement and legal problems in trying to develop 
an exit assessment that both matches the Missis- 
sippi curriculum and is normed ou a representative 
national sample. We would much prefer to have the 
MDE use the norm-referenced ninth grade TA / J or 
the Work Keys test as a measure of how Mississippi 
students are doing in comparison to a national 
average than to try and make the exit examination 
both nationally norm-referenced yet have the 
content be representative of state t urriculum 
structures, 



Number of Forms 

The number of forms that need t< > be available for 
the MAAP deserves careful consideration. (We are 
not considering forms that are identical except for 
the pilot items as being separate forms. It should 
also be pointed out that the number of f orms 
available f or the subject mailer tests and f or Work 
Keyxare relevant issues for consideration.) We of fer 
the following recommendation regarding forms. 

Recommendation 39: 

Develop rules/procedures for designing forms 
for makeup examinations and out-of-school(i.e., 
adult education) populations.' 7 Determine 
whether forms will be reused. Determine how 
many times you will administer the test each 
year. Determine equating procedures (e.g., 
number of anchor items). Basedon these consid- 
erations, develop enough alternate forms to last 
through the second year of test administration. 
Develop more forms/items during th is time so 
that a sufficient supply is continuously avail- 
able. 



Equating 

High school graduation test questions need to 
remain secure, and they cannot be t lsed to any 
great extent. (We would make the same statement 
about the course exams and Work Keys tests.) 
However, to be f ail to individuals who take differ- 
ent forms of the test, the f orms need to be equated. 
It is particularly important that diploma-sanction 
tests be equated at tne cut score, so that a perfor- 
mance level that was considered a pass on one form 
of the test would not be considered a fail on a 
diff erent form. There are many ways to equate, but 
we should stress that the process becomes a bit 
more difficult if there are both performance 
assessment items and multiple-choice items. The 
process becomes even more dif ficult if the propor- 
tion of the item types does not remain constant 
across forms and/or if the decision has been made 
to scale the two item forms separately. 

There are many ways to equate, but the two more 
common general procedures considered viable for 
diploma sanction tests are to use anchor items or to 
pre-equate. Anchor item equating is generally 
pref erable to pi e-equating for final t "it score 
decisions, because the subareas of the test will likely 
be diff erently affected by instructional changes. 
Pre-equating should be done when initially build- 
ing various test forms. The cut score will, of Course, 
be set on the original f orm. The wording of the rule 
adopting a t ut score needs to be carefully consid- 
ered so that it is clear how to equate that score to 
stores on subsequent forms of the test. 

Recommendation 40: 

Use a technical advisory committee to help 
develop spec ific equa t i ngprocedu res. 

Standardization of Test 
Administration 

Recommendation 41: 

Carefully consider policies regarding all test 
administration conditions. For example, the 
decision whether or not to use calculations in the 
mathematics test must be constant across all 
administrative sites. Train personnel ad- 
equately to administer the tests. Consider 
random auditingof the administration process 
to ensure uniformity throughout the state. 
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Education Issues 

A Li. OF THE ISSUES INVOLVED IN A 
high school graduation test (as well as the 
other Mississippi assessments) could be 
considered educational issues. However, in this 
section, five special kinds will be discussed: articula- 
tion with other tests, retesting, remediation, special 
education, and adult education. 

Articulation with Other Tests 

The Mississippi high school graduation test should 
be articulated with the other tests in Mississippi. 
(T his seems like an appropriate time to stress that 
there needs to be articulation among the various 
committees (e.g., norm-referenced committee, exit 
test implementation committee, etc.). Several 
individuals we interviewed thought the communi- 
cation across these committees was less than ideal.) 
Ol particular concern should be the early testing 
(e.g., the I'I'HS, the Performance Assessments, and the 
7H /'given in grades 4-9), the end o/ course tests, and 
the Work Keys tests. As we understand the current 
accreditation procedure, the various tests all count 
in a formula for determining accreditation. 
Thought needs to be given as to whether these 
various tests and their specif ic uses within the state 
complement each other or result in competing 
goals. 

In a subsequent section on accreditation we will 
discuss some of these issues further. Here we would 
like to point out that, whatever the uses of other 
tests in an accreditation system, Mississippi should 
administer tests in earlier grades that would assist 
in identifying students who may not be acquiring 
prerequisite knowledge and skills at the expected 
rate to enable them to pass the MA AH Attention 
needs to be given to the relationship of the content 
on the grades 4-9 tests to the content that is on the 
MA AH Ideally the early grade testing would be 
testing for the specific prerequisite knowledge and 
skills i hat are important for passing the MAAP. If 
not, the early tests could not be used to identif y 
those likely to need additional instructional sup- 
port prior to taking the MAAP. While ideally, there 
would be a relationship between the contents, it 
seems important to call the reader's attention to 
some additional concerns. It is surely possible for a 
student not to have acquired some prerequisite 
knowledge and skills by, say, grade K, yet that 



student— with appropriate ef f ort— may well acquire 
the knowledge and skills necessary to pass the 
MAAP. Likewise, doing well on an 8th grade test 
that covers prerequisite outcome measures in no 
way guarantees that a student will acquire the 
outcome measures sufficient to pass the MAAP. 
This latter point needs to be made very clear to all 
students, parents, and educators. Early tests should 
not and will not cover all the competencies assessed 
on the MAAP. 



Recommendation 42: 

Have subject matter experts study the content of 
the grades 4-9 tests and the competencies to be 
measured on the MAAP. If appropriate content 
articulation does not exist, determine whether 
the problem should be fixed by changing the 
content of the early tests or the MAAP. 



Recommendation 43: 

Even if close content articulation exists, be 
cautious about any "predictive" interpretation 
of the scores of a single individual from testing 
in earlier grades. Such tests should be thought of 
as providing only an early awareness, not a 
strong, reliable predictor. 



Consideration should also be given to whether the 
course subject matter tests assess the same compe- 
tencies as the MAAP. For example, if the content of 
the Algebra I examination covers many of the 
competencies assessed on the MAAP, and if a 
student passes the Algebra I exam prior to grade 10. 
consideration should be given as to whether it 
should still be necessary to pass the mathematics 
portion of the MAAP. 



Recommendation 44: 

Consider whether passing any of the course tests 
tan seme as alternatives to passing certain of 
the MA AP tests. 

Finally, there should be articulation with the Work 
Keys tests. If the content of those tests are quite 
different, then we have the same types of articula- 
tion concerns as would exist if the other assess- 
ments differ f rom the exit examinations. 
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As has probably been apparent from the previous 
discussion, there needs to be concern with the total 
amount of testing as well as the articulation of the 
tests. It is certainly possible that there is simply too 
much testing being planned at the high school level 
and we believe educators in Mississippi need to 
review each projected program and assess the 
purpose of it. 

Retesting 

Retest issues are of two types: how and whether to 
give makeup tests for absentees (not a retest of the 
same person), and how many c fiances a single 
individual should have to pass the test. 

If someone is ill or has an excused absence on the 
day of a test, that person should have an opportu- 
nity to make up the test as soon as possible. The 
state must consider whether the district/building 
should have a window of opportunity in which it 
can retain the tests and provide an opportunitv for 
makeup tests. This provision seems appropriate if 
the window of opportunity is not too long: we 
suggest approximately one week total. Special 
consideration should be given to the issue of 
whether alternative forms of the writing prompts 
and the performance assessment portions of the 
other assessments need to be used for makeup 
examinations. Extended absences should be 
handled on a different basis. Written policies 
should be formulated regarding all makeup 
procedures, 

Other retake issues include the following: Is the 
student who fails a test area (e.g., writing) required 
only to retake the f ailed area; is a student who f ails 
the test obligated to retake that test during each 
succeeding administration or may the student "sit 
out"; and when a school is closed by a crisis, can the 
test administration be rescheduled for that particu- 
lar school outside of the announced '•window *'? 



Recommendation 45: 

The department should prepare and the board 
ahould adopt specific written procedures 
regarding makeup examination provisions. 



The number of permissible retakes also should be a 
matter of policy. Evidence in other states suggests 
that four or five total attempts prior to scheduled 



graduation should be suff icient. A person should 
be allowed free, unlimited retakes through an adult 
education program if the person has not passed 
during the regular high school time period. 



Recommendation 46: 

The depa rtment should prepa re a nd the boa rd 
should adopt specif ic written rules regarding 
the number of retakes that should be allowed 
and how many attempts a student should be 
given prior to the time he/she is scheduled to 
graduate. 



Remediation 

We are aware that the Mississippi Department of 
Education is interested in pursuing its role in 
advancing the prof essional development of teach- 
ers. This is commendable with or without high 
stakes exit examinations. When a state requit es that 
students acquire certain competencies (as mea- 
sured on an exit exam) prior to graduating, that 
state should have some responsibility for assisting 
the local schools in planning for remediation. It 
seems wise that a state rule should be established to 
provide that a child who fails must he given the 
opportunity for remediation."* 

Several issues need to be considered regarding 
remediation. For example, who is responsible for 
designing remediation materials— the local schov.i 
or the state? If the state designs the materials, is it 
responsible for evaluating the materials for their 
effectiveness? Should the state hold workshops 
around the state on how to remediate? Should the 
state attempt to control the publication of materi- 
als by commercial publishers? If remediation 
programs increase the costs to the local districts, 
will they be reimbursed by the state? How c an 
remediation be completed without the negative 
side e ffects of tracking or grouping? If a student 
who has not passed the graduation test require- 
ments but has passed all other requirements 
decides to return to school for a 13th year, can that 
student by counted for state aid? Will local schools 
be required to document their offers of remedia- 
tion to those who f ail? 



Recommendation 47: 

Develop a detailed proposal (set of guidelines) 
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that addresses questions regarding remediation 
effortsand the respective responsibilities of the 
state, thedistrict, and the student for remedia- 
tion efforts. 1 '' This set of guidelines should then 
be approved by the department and the state 
board. 



Recommendation 48: 
Carefully investigate liability issues with 
assistance from the attorney generals office. 
Attempt to obtain necessary statutes with respect 
to liability. Inform all committees and all staff 
regarding their potential liability. 



Legal Issues 

ANY HIGH SCHOOL GRADUATION 
test should be built so that it is technically 
sound. Furthermore, decisions made f rom 
the data should be applied f airly. Generally speak- 
ing, if one can provide evidence regarding those 
issues, the process should be legally def ensible. 
T hus, we have already addressed legal issues and 
will continue to do so in sections following this one. 
However, some tin >re specific legal issues should be 
kept in mind and are addressed in t his section. 

First, the state should be aware that tests are fre- 
quently questioned from a technical standpoint. 
The courts will use the Standards for Educational and 
Psychological 'lhting{AERA, A PA, NCME, 1985). | It 
should be pointed out that the process of revising 
these standards is underway, and readers must 
remain alert to what the new standards say when 
they are published— probably not before 1990.) 
With respect to legal issues, it is wise to obtain legal 
involvement early f rom the attorney general s 
office. This may be less urgent for Mississippi 
because they already have an exit examination in 
place. However, it would be our expectation that the 
failure rate for the new examination may be 
considerably greater than what it has been for the 
FLK. Further, there is discussion of using the 
M A AF to f orce au ricular change— a lactic that is 
likely to meet legal challenge. 

Liability Issues 

A thorough investigation of liability issues should 
he made. Do existing state statutes protect employ- 
ees? 1 1 tlu- stale department retains the service of 
local educators, does an v state slat ate protect them? 
( ian a teacher be sued because of a claim that he/ 
she did not leach some content— or teach it well 
enough? Arc committee members who make 
recommendations covered under stale statutes? 



Notification 

One of the main legal issues other than test quality 
i« due process. Individuals need sufficient notif ica- 
tion of the wwgraduation requirement. This 
notification should be detailed with respect to the 
standards and competencies that the tests will 
cover. Details concerning how to notify students 
and parents need to be worked out. Certified letters 
need not be sent to every child/parent. Neverthe- 
less, there should be some documentation that the 
notices were sent (announced). Procedures such as 
placing notices in a student handbook, placing 
notices on report cards, etc., should be considered. 
One suggestion is to produce a video tape to show 
all student s and have each district provide an 
af f idavit that they have shown the tape to all ninth 
graders. Whatever is done regarding notification 
for the first cohort should be continued for all 
f uture classes. 



Recommendation 49: 

Schools should be notified immediately regard- 
ing the NEW graduation requirement and the 
information disseminated to all teachers. 
Studentsand their parents should be notified no 
later than the year in which affected students 
are in the ninth grade. The public in general 
should benotified immediately following 
decisions made by the state. 20 



Timing 

As mentioned above, due pro( ess requites \uffkmtt 
notification. Thus, the amount ol lead time be- 
comes an important legal (and educational) issue. 
As we discussed previously (see Recommendation 
21), (he exam may indeed not be developed in time 
lot tlu- necessary piloting and revisions so that it 
can be used for a graduation requirement prior to 
the graduating ( lass of 2001. Whether or not Mu- 
test could actuallv be ready, there is the issue of 
sufficient diicpioccss. A general rule of thumb 
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might be that the students who are to be impacted 
by the assessment be notified of the specific 
standards and competencies to be assessed no later 
than when they are in ninth grade. This would 
mean that notif ication would need to be given no 
later than the fall of 1997 if our suggested guideline 
is followed.- 1 Given that the estimated implementa- 
tion date for the new English/Language Arts and 
Reading Frameworks is 1997-1998. This seems like 
reasonable timing. 

Related to the timing issue is when to phase out the 
FI.K. If the new MAAF is to be required for the 
grai ".Kiting class ol 2001. the FI.E would need to be 
administered through the spring of 2000. 



Recommendation 50: 

The FLE should not be used for accreditation 
purposes after the first year the MAPP is used 
for such. 2: 



Opportu n i ty to Lea rn 

As we have discussed previously, it is illegal to 
require students to pass a test that covers standards 
and competencies unless it can be shown that the 
students have had the opportunity to learn that set 
of material. It would be inappropriate to require 
the new exams foi graduation until it could be 
demonstrated that the new cm riculuins were in 
place in the districts, that the teachers had received 
suf ficient professional development so that they 
knew how to effectively teach the new curriculum, 
and tha', indeed, the students had an opportunity 
to learn the new material. We remind our readers 
of an earlier recommendation that if it cannot be 
shown that students have had an opportunity to 
learn the new curriculum, the assessment should be 
postponed. 

Documentation 

The general issue of document at ion also needs 
some attention.' 1 The lack of various tvpes of 
documental ion can become a central foe us of a law 
suit. We are not totally aware of the documentation 
policies for the FIT,. I lowever, these should be 
reviewed todetei inine whether (hey are suf f i- 
ciently detailed. Foi example, when committees 
rev iew items for sensitivity or bias, consideration 
should be given as to whether a complete record 



should be kept regarding which individuals consid- 
ered which items biased and what changes to the 
items resulted if they were revised.-' One also needs 
to consider how long any documentation should be 
kept. 



Recommendation 51: 

If sufficient documenta Hon policies do not exist 
for the FLE, the department should prepare, 
and the board should adopt, detailed policies 
regarding what should be documented and how 
long the documentation should be kept on file. A 
general suggestion is that all documentation be 
kept for a period of a tleast five yea rs following 
the school year in which thetestwas adminis- 
tered. Consider keeping "forever" the initial 
development documentation and records about 
when, why, and how procedures are adopted 
and/or changed. 



Security Provisions 

We are aware that Section 'V7-1G-4 of the current 
( lode regarding the Statewide Testing Program 
discusses violations of test security procedures and 
penalties. However, we believe the department 
should consider whether there needs to be addi- 
tional statements regarding what constitutes 
inappropriate, unethical, unprofessional, and 
possibly illegal behavior on the part of educators 
and student vith respect to violating administra- 
tive standards, security procedures, and so forth. 



Recommendation 52: 

In consultation with the attorney general's 
office, the department should prepare and the 
State Board of Education should adopt rules on 
what constitutes inappropriate behavior on the 
part of educators or students with respect to test 
taking, security issues, and so forth, and what 
penalties will be imposed for violation of these 
rules. These rules and the penalties should be 
disseminated toeducators, students, and parents 
prior to the initial administration of theMAAP. 



Recommendation 53: 

Test security provisions must bea shared respon- 
sibility among the contractor for test adminis- 
tration, the state department, and the local 

schools. 
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Accommodations 

As mentioned earlier in the discussion prior to 
Recommendation 1, attention needs to be given to 
whether the current (lode and other policies are 
sufficient regarding accommodation practices. 



Recommendation 54: 

Review accomodation codes/regulations to 
determine whether they need to be updated. 



Policy/ 

Administrative 
Issues 

A PLETHORA OF POLICY/ ADMINIS- 
trative decisions must be made and rules 
must be passed prior to implementing a 
high school graduation test requirement. Obvi- 
ously the State Department already has made many 
of the necessary decisions and rules because they 
wouid be much the same for the FLE and the 
proposed MAAP. We list below a set of questions 
that, if they have not already been answered, will 
need to be considered by the Department. 

• Who approves the various test construction and 
test administrative procedures? 

• Who develops, approves, and oversees all test 
security issues? Is there a procedure in place to 
monitor the districts to assure they do not issue 
diplomas to those who have not passed the 
MAAP? 

• Are there suf ficient equipment ''facilities for 
storage of secure materials, shredding out-of - 
date sec ure materials, and soon? 

• I las it been determined how to handle retakes 
for :hi >se who have completed all other high 
school requirements and have "left" sc hool? 

I las a polic \ been established for issuing 
diplomas to adults? 

• Do all transfer students from other stales (even 
those trausfei t in^ during the second semester 
< if their senior war) need to pass the MAAP to 
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receive a diploma? What if those studentshave 
passed another state's graduation test in the 
same subject? How about students who transfer 
within the state from non-public to public 
schools? 

What, if any, accommodations or exemptions 
will be permitted for students on an IEP or 504 
Plan? What about those whose language 
spoken in the home is not English, migrant 
students who move in and out of the state, or 
those who are simply foreign exchange stu- 
dents spending less than two years in the state? 
What is the intent with respect to language of 
the exams? Is it the intent that all tests should be 
in English, or only that students should read 
and write English? 

What happens to a senior in t he year prior to 
the effective date of this graduation require- 
ment who fails a required second semester 
course that must be completed in summer 
school or in the first semester of the next year 
(when the graduation requirement applies)? 
Will such a student also have to pass the test 
even though he originally was not required to 
do so? 

Will the state have a polic y on participation in 
commencement exercises by students who 
complete curriculum requirements, but not 
test requirements? Will such a student receive 
anything— e.g.. a certificate of attendance or a 
document verif ying accomplishments? 

Who approves various external committee 
appointments? Should there be written policies 
regarding representation on those committees? 

Who finally sets or approves the cut score 
following the- recommendation from a cut 
score committee 1 ? Will the respective rosts of 
false positives and f alse negatives be considered 
and. if so, In whom? 

Is the system of tracking students that is being 
used for the FLE working ok? If not, how does it 
need to be changed? 

Are the reports being \ised for the FLE suf f i- 
ciently detailed? Should there he more atten- 
tion to stuck ing the results via sue h procedures 
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as disaggregating the results by ethnicity, 
ci nirses taken, etc? Should the results for the 
separate item formats (e.g., multiple-choice and 
performance assessments) be reported sepa- 
rately and should there be the same disaggrega- 
tion of the results by item format as for the total 
results? 

Who will develop the total annual test adminis- 
tration plan and how will it be communicated 
to all school districts? Will test administrations 
be monitored by the state? 

Recommendation 55: 

Consider the questions such as those raised above 
and make the necessary decisions concerning 
them. The department and the Board of Educa- 
tion must devote adequate time to the identifica- 
tion and resolution of critical questions that 
must be addressed. 



various vendors, there remains a great deal of 
additional work that must be done by staff. For 
example, an individual should be assigned major 
responsibility for each content area to be assessed. 
A measurement specialist with technical back- 
ground will need to spend considerable time 
writing RFPs. Specific tasks for the contractors 
need to be developed and the contractors' execu- 
tion of these tasks needs to be monitored. Someone 
must coordinate the assessment staff in the areas of 
test development, test administration, and test use 
and reporting. There needs to be an overall super- 
visor. 

There may also need to be additional staff in the 
curriculum/instructional area. We are not aware of 
just how many professionals are employed in these 
areas, but with the advent of the revised curriculum 
structures and the new assessment, much profes- 
sional development of staff in the districts needs to 
take place. 



Human and 
Financial 
Resource Issues 

LEGISLATORS CANNOT BE EXPECTED 
to recognize the huge additional costs of 
implementing a high school graduation 
assessment that is composed of both multiple- 
choice and performance assessment exercises. The 
State Department of Education must provide a 
rat ionale to them to support any request for 
additional human and financial resources. This 
section discusses needs in staffing, advisory com- 
mit tees, contractors, and financial resources. 

Staff ing Needs 

It is our understanding that the Student Assess- 
ment unit has a professional staff of "cither five or 
six individuals— count ing the director. (Our notes 
suggest five total but the Summary He fmrtfoi IW-t 
Mississippi Sttttetttiile Ihliti^Prngr/inihsissix indi- 
viduals in the ( )f lice of Student Assessment.) (liven 
the total assessment program for which this staf f 
has responsibility, we believe it is important to 
increase thesi/eof the staf f . Even though a large 
poi t ion of the work will be contracted out to 



Recommendation 56: 

The department should conduct a careful study 
to access additional staff ing needs in the student 
assessment and curriculum/instructional units. 
Weshould think that, at a minimum, the new set 
of assessment plans would call for some addi- 
tional professional staff in the student assess- 
ment unit. There probably needs to be additional 
staff added in the curriculum/instructional 
unit as well. 



Advisory Committees 

The need for several advisory committees has 
already been discussed in various places in this 
report. and f urther information about our recom- 
mendations regarding the composition of these 
committees can be found in the next section. 
However, for the ease of individuals interested in 
human and financial resource needs, the* are listed 
here under a specific recommendation. 

Recommendation 57: 

If they have not already been established, the 
following advisory committees should be ap- 
pointed: A testing policy advisory committee, an 
item sensitivity review committee, a technical 
advisory committee, a content review committee 
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in each content area of the assessment, and a 
committee to recommend a standard (cut-score) 
(one such committee for each subject assessed). 9 



Contractors 

Mississippi has considerable experience dealing 
with contractors, and we suspect they have done an 
admirable job. While we are unaware of state 
restrictions on contracting procedures, we hope the 
Department has, or will be given, the freedom to 
grant single source contracts and to issue agree- 
ments that extend across fiscal years. We have one 
recommendation that is based on considerable 
experience that it is advantageous to keep the 
number of contractors down to a reasonably small 
number. 



Recommendation 58: 

For theMAAP, the department should consider 
usingat most two contractors: one for test 
development and formal field tryoutsand 
another for test administration, scoring, and 
reporting. 2 " 



Roger Trent (Ohio) would be able to provide 
estimates for what their states are paying. Louisiana 
has a program with many similarities to the pro- 
posed Mississippi program and Rebecca Christian 
and her staff would be a useful resource. Michigan 
is in the process of having a high school exit test 
developed (for state endorsement rather than 
diploma purposes but that would not affect costs) 
that will include performance assessment exercises. 
Diane Smolen could provide information regard- 
ing the development costs f or Michigan assess- 
ments. Many other states also have high school exit 
tests, and it is our experience that the directors are 
very willing to assist other state directors by provid- 
ing information regarding costs of their programs. 

Recommendation 59: 

Obtain information from other states with 
similar programs regarding fiscal needs. Make 
recommendations to the legislature that are 
sufficient to cover department needs, and make 
clear to them that the task simply cannot be 
accomplished without adequate support. 



Financial Resources 

The need for appropriate staf f, advisory commit- 
tees, and outside contractors relates to f inancial 
needs. The specific costs depend on decisions 
regarding many of the issues already discussed in 
this report. Costs under some test designs easily can 
be more than triple what they would be under 
other designs. For example, the higher the propor- 
tion of the assessments that are perf ormance- 
based, the higher will be the costs of administering 
and scoring the assessments. Two specif ic issues 
that have not been considered earlier and may have 
cost implications are (1) whether non-public 
students will be tested (even though they are not 
required to pass to receive a diploma) and, if so, 
who will pay the cost, and (2) whether the state is 
responsible for the f inancing of state-required 
local school functions (e.g., professional develop- 
ment of staff and costs of local administration of 
the assessments). Other states can provide detailed 
information about various- costs, and we urge 
Mississippi personnel to contact them. For ex- 
ample, Florida and Ohio have been using multiple- 
choice tests for high school graduation for years, 
and directors such as Tom Fisher (Florida) and 
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SECTION III: 



Sequencing of Tasks 

IN DESIGNING A PROGRAM FOR A HIGH 
school graduation test, it is usef ul to have in 
mind the total set of processes and approxi- 
mate completion dates for various activities. While 
we recognize that Mississippi already has consider- 
able experience in designing and implementing a 
high school graduation test (the FEE), it might be 
useful to list, in abbreviated fashion, the tasks we 
believe are required and some suggested timelines. 
The timelines are based on the assumption that our 
recommendation regarding the new test imparling 
the graduating class of 2001 is followed. Obviously, 
the suggested sequence and timelines are based on 
certain assumptions about decisions readied. 
Different decisions would result in dif ferent steps/ 
timelines. 

It is important to note that many process strands 
actually run concurrently. Furthermore, missing 
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one or more of the targeted deadlines can mean 
that all other deadlines following that one are 
missed and that the program cannot be imple- 
mented on time. Both the legislators and the Board 
of Education need to understand that a lot of work 
needs to be done and that it takes sufficient staff 
and resources to accomplish the tasks. 

Below is one possible sequence of activities that 
could be carried out to develop and implement the 
M AAH It represents a sequence that we believe to 
be a reasonable approach. Detailed suggestions 
about how to perform those activities are not 
present in this section. The text and recommenda- 
tions in the previous sections cover main such 
details. 

Sample Tasks and Completion Dates 

(assuming requirements are for the 2001 graduat- 
ing class) 



Technical Advisory Committee: This com- 
mittee should be composed of at least one 
measurement expert from within the state and 
at least one individual who has been (or is) the 
director of a similar competency testing 
program in another state. Other members of 
the committee should be widely recognized as 
measurement experts, and they (as a group) 
should have expertise in test development, 
scaling, equating, and all other major areas 
about which the department may wish to 
obtain advice. 

Content Review Committees: These commit- 
tees should be composed of content experts 
(mostly or totally state residents) in each area of 
the test. State department personnel who are 
specialists in the respective subject matter areas 
should sit on these committees, although it is 
debatable whether they should have the right 
to vote. 



Task 1: Establish appropriate advisory 
committees. Do this as soon as possible. This 
task involves determining what committees 
need to be established, determining criteria for 
selection of the committee members, soliciting 
and evaluating the nominations, officially 
appointing and training the committee mem- 
bers, and maintaining the committees over 
time. We suggest the following committee with 
the understanding that it might be wise to have 
some overlap of committee members: 

• Department of Education Steering Commit- 
tee: This committee should represent the 
various units of the Department whose tasks 
will he impacted by this program (e.g., the 
Student Assessment, Curriculum, Vocational, 
Ad ult, and Special Education units). 

• Testing Policy Advisory Committee: This 
committee would be much like the previous 
task force or tiie current implementation 
committee. It should represent the state educa- 
tion community to advise on policy. 

• Item Sensitivity Review Committee: This 
committee should he composed niostlv of the 
state's minority groups, but with at least one 
member from out of state who is a recognized 
expert on bias issues in assessment. 



Standard-setting Committees: These commit- 
tees should be composed primarily of individu- 
als who are both qualified and credible. They 
should probably be primarily composed of 
educators in the state who have knowledge and 
experience both in the subject matter being 
assessed and at the grade level of the students 
being assessed. 



Task 2: Determine exactly what standards 
and competencies will be assessed. As we 

suggested earlier in the report, this is a very 
important task and one we believe is far from 
complete. Once this is determined, there 
should probably be specific approval of those 
standards and competencies by the State Board 
of Education. Ideally, this task should be 
completed no later than May, 1995. 



Task 3: Disseminate information about Task 
2 to all students who will be impacted, 
parents, business leaders, and other relevant 
constituencies. Complete before schools let 
out for the summer of 1995. 



Task 4: Complete test specifications for each 
test area. ( lorn plete hv August, 1995. 
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Task 5: Hire a contractor for development of 
item specifications, item/test development, 
and field tryouts.tbmplete by December, 
1995. 



Task 6: Have the contractor complete the 
item specifications, item writing, informal 
pilot testing, and item editing. Complete by 
February, 1996. 



Task 15: Select operations contractor for 
administration, scoring, and reporting. 

Complete by early Spring, 1998. 



Task 16: Conduct regional seminars for 
school administrators and testing coordina- 
tors on the administration, scoring, and 
reporting procedures. Complete by early 
Spring, 1998. 



Task 7: Perform content committee review 
and revisions as necessary. Complete by 
March, 1996. 



Task 17: Complete production of all neces- 
sary materials for first tests and have them 
ready for distribution. Complete by Summer, 
1998. 



Task 8: Produce camera-ready copy for 
formal field tryouts. Complete by July, 1996. 



Task 9: Field test items first time in Fal 1 on 
students in Grade 10. Complete by early Fall, 
1996. 



Task 10: Prepare and disseminate descriptive 
information and sample test items to assist 
in preparing teachers, students, and parents. 

Complete by early Fall, 19%. 



Task 11: Develop and adopt rules governing 
test administration, scoring, and reporting. 

Complete by Spring, 1997. 



Task 12: Analyze field test and revise items as 
necessary for second field test. Complete by 
Spring, 1997. 



Task 13: Conduct second field test. ( Complete 
by early Fall, 1997. 



Task 18: Administer first real test to tenth 
graders (class of 2001). Complete in Fall, 1998. 



Task 19: Score, analyze results of first admin- 
istration, and establish passing standards for 
the first administration. Complete in late Fall. 
1998. (As mentioned earlier, the panel recog- 
nizes that a practical case can be made for 
establishing the standards af ter the second 
pilot.) 



Task 20: Design and implement a plan for 
releasing test results to the schools and the 
general public. Complete in late Fall, 1998. 



Task 21: Review and repeat steps above. Plan 
extended timeline to include at least two 
administrations per year for 10th through 12th 
graders. Include time for equating procedures 
for future test administrations. This task should 
be carried out continuously. 



Task 14: Revise items from second field test 
as necessary. Select items for the required 
number of forms needed for the first year of 
the real testing from the subset of items that 
did not need any substantive revision. 
Complete by early Spring, 1998. 
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Section IV: 



Using Test Scores for 
Accreditation Purposes 

WE HAVE BEEN GIVEN COPIES OF THE 
"Accreditation Requirements of the 
State Board of Education" (Bulletin 171), 
and have had the opportunity to meet with indi- 
viduals in Mississippi who are considering revisions 
(refinements) to these requirements. The new 
assessment program, including a revised exit exam 
and other additional testing in the schools, should 
be considered while making these revisions. 

One issue has to do with what standard to use in 
setting performance standards for all the measures. 
Previously, the accreditation standards on the FI.E 
and the Stanford Achievement Test used average 
scaled scores. Holding districts accountable for 
raising average scaled scores may provide incen- 
tives for the district that are incompatible with the 
purpose of the exit test, and indeed, may be incom- 
patible with how some educators would like to see 
resources expended for achieving the standard on 
the elementary school achievement tests. It may be 
easier to raise average scaled scores by concentrat- 
ing instructional attention on those that already 
score above the standard for graduation. That is, 
schools may be rewarded for allocating resources 
primarily to assist individuals who would have 
passed the MAAPon the first attempt anyway 
instead of providing help to lower achieving 
students who are at risk of not graduating. 



Recommendation 60: 

In the accreditation system, the "success of the 
school system" could and perhaps should be 
defined in terms of the number of students who 
demonstrate the desired level of performance 
rather than in terms of average scores." In any 
case, to maintain the intergrity of purpose for 
theMAAP, the satndards at least for that exam 
should relate to the proportions of students who 
are successful on a specified attempt. 



Assuming the above recommendation is followed, 
another issue to consider is whether schools should 
be held accountable for the percent of students 
who pass the MAAP on the first try or the cumula- 
tive percent who have passed on some future 
attempt. Individuals attain desired levels of 
achievement at different rates. Some individuals 
need more time than others to demonstrate the 
desired levels of competence. Holding schools 
accountable for student performance on the first 
attempt seems to run counter to the belief that 
students learn at different rates, and the role of the 
school (particularly with respect to the MAAP) is to 
help as many students as possible to eventually pass. 
(This would not have been a major issue for the 
FI.E, because almost everybody passed on the first 
attempt. We do not anticipate that occurring on the 
MAAP.) 



Recommendation 61: 

The Mississippi Department of Education may 
wish to consider changing the attempt after 
which schools are held accountable for a speci- 
fied proportion of students passing the test. We 
believe it would be preferable to use the cumula- 
tive proportion who havepassedat theendof 
grade 10, 11, or at the end of grade 12. A nother 
possible consideration would be to use a stan- 
dard that included the percent passing at two 
different grades (e.g., 75% pass rate in each test 
by the end of 10th grade and 85% pass rate after 
11th grade. 29 



It has been proposed that the MAAP be adminis- 
tered for the first time at gride 10. The MAAP will 
have very high stakes for students who cannot 
graduate without performing satisfactorily on the 
tests. Results from these tests will also have high 
stakes for the high school education community. 
However, the lower elementary and middle schools 
will be concerned primarily about preparing 
students to do well on the norm-referenced tests 
included in the school accreditation system. In the 
past, norm-referenced tests administered in the 
lower grades have been weighted three times as 
heavily as the FLE (i.e., there were three perfor- 
mance standards for the NRT and only one for the 
FI.E). With the change in 1994-1995 to administer- 
ing norm-referenced tests in grades 4-9, the norm- 
referenced test results could be thought of as 
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counting six times as much as the FEE (or the new 
MAAP when it starts counting for accreditation). 
Indicators used in the accreditation system will 
drive at all levels what teachers teach and what 
students learn. The Department needs to be 
concerned that whatever elementary and middle 
schools are held accountable for teaching and 
students for learning be similar to, and/or provide 
a solid foundation for, what is measured by the new 
exams. Otherwise, elementary and middle schools 
could be preparing students very well for the tests 
administered at the lower grades and be rewarded 
for doing a good job. but find that students enter- 
ing 10th grade are not well prepared to pass the exit 
exam, and some may not graduate as a result. If this 
should occur, both students and high schools will 
pay an extremely high price because of the lack of 
alignment. 



Recommendation 62: 

Performance standards established for the 
accreditation of school districts should be 
appropriately aligned and weighted. MDE 
should study carefully the alignment and 
weighting of performance standards used across 
the elementary, middle, and high sckool grades. 



The Performance Standards for accreditation are 
defined in Bulletin 171, Revised 1994. pages 29-31. 
The "annual minimum value" (AMV) for the 
criterion-referenced tests (presumably including 
FI ,K and the end-of-course exam in Algebra) is set 
at a point that is one-hali of an individual standard 
deviation below the mean score for all students 
tested, but this AMV is not allowed to fall below 
70% correct on any of the criterion-referenced 
tests. Although it is possible to build tests intending 
to have specifications resulting in similar mean 
percentage correct scores, there is no indication 
that such a specification will he given much prior- 
ity when a new exit exam or when new end-of- 
course exams are built. Furthermore, good test 
construction should not pay attention to such a 
requirement. Again, since the FI ,E has a ceiling 
effect f or all three tests (even on the first attempt), 
this requirement for a minimum 70% or 80% 
correct regardless of what is measured or how it is 
measured has not been a problem on the FI ,K. With 
the introduction of the (assumed) more rigorous 
MAAP however. MDE mav find that maintaining 



the same minimum average percent correct across 
the three test areas isjust not sensible. There are 
simply no a priori reasons why the expectat ions for 
the students' performances should be the same 
across three different outcomes in different cur- 
ricular areas. (Note: if the MDE should change its 
performance standards to include cumulative pass 
rates— as we recommend above— our concern 
would shift to how the passing standards are 
established for each test area. Again, setting the 
same percentage correct raw score for all three tests 
areas is not likely to be appropriate.) 



Recommendation 63: 

Remove from the revised performance standards 
any reference to minimum percent correct. 



We infer from reading Bulletin 171 that the End-of - 
Course Exams will be included as performance 
standards in the Accreditation System as they are 
implemented. As such, these exams will have high 
stakes for schools but not necessarily for students. 
The Department needs to think caref ully about the 
implications of this. Is this setting up a potential 
conflict between how the districts and how the 
parents/students would like to see resources 
allocated? Are districts going to be allowed to use 
the results of the end-of-course tests at the indi- 
vidual level (e.g., by allowing students to count their 
scores on such tests as a part of their course grades)? 
If some schools do allow the results to be used at the 
individual student level and other schools do not, 
how will this differential impact on students and 
their motivation to take the tests seriously impact 
the fairness of the accountability system? 

As more of these end-of-course exams are added, 
the weight of the exit exams in the am -edit at ion 
system will be reduced even further. It is conceiv- 
able that a student may have passed an end-of- 
course exam at grade 9 at a level demonstrating 
performance superior to that required to pass the 
more generic exit exam in the broad curricula! 
area, hut still be required to sit in 10th grade for the 
entire exit exam, including the generic test in a 
subject area already tested at grade 9. 
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Recommendation 64: 

As new end-of -course exams are brought on line, 
MDE should study the extent to which these 
exams measure knowledge and/or skills in- 
cluded in the exit exam, review and perhaps 
expand the purpose for administering these 
tests, and evaluate again the proper weights 
such tests should have as compared to other 
performance standards included in the accredi- 
tation system (especially ascompared to the tests 
studentsare required to pass in order to gradu- 
ate). 



Finally, we wish to make a comment about the 
constructed response sections on the norm-refer- 
enced tests administered in grades 4-9. As we 
understand the current plan, results on these 
sections are not to he counted in an accreditation 
system. This may not be wise. If they are not 
counted, the reforms that the MDE seeks may not 
be realized. 



Recommendation 65: 

Review the performance standards for the NRTs 
in grades 4-9 to determine whether or not it is 
possible and advisable to incorporate results 
from the constructed response sections as indica- 
tors in the accreditation system. 



Section V: 



Conclusions 

WE HAVE DISCUSSED A NUMBER OF 
issues, offered a number of recommen- 
dations, and presented illustrative tasks 
to be performed with suggested completion dates 
for a state-mandated high school graduation test. 
We have also made some recommendations with 
respect to using this test and others for accredita- 
tion purposes. 

It is clearly possible to develop a well-designed high 
school graduation test that meets curriculum, 
psyclmim't i ic, v-dui at ional, legal, administrative, 
and resource requirements. I fowever, as this 



document has undoubtedly made clear, the task is 
not easy. For the task to be done well, a variety of 
steps need to be completed. For these steps to be 
completed, adequate funding needs to be made 
available. 

While our recommendations will not all be re- 
peated here, we point out below some of the aspects 
that have been considered in the report. 

• It is legally inappropriate to hold students 
accountable for passing an assessment that 
covers material that they have not been taught. 
This makes using a high stakes graduation 
assessment to drive auricular change some- 
what troublesome. One can use the announce- 
ment of an upcoming assessment to drive 
curricula!' change. This, of course, requires that 
there be considerable time between the an- 
nouncement of the assessment and its imple- 
mentation. 

• Multiple-choice items can measure higher- 
order thinking skills and procedures. Perfor- 
mance assessments may not offer high enough 
psychometric qualities to be used for high 
stakes assessments. Mississippi certainly should 
not use performance assessments to measure 
those competencies that can be assessed w ith 
multiple-choice items. 

• It is unlikely that any "off-the-shelf" test w< mid 
be an acceptable high school exit test for the 
students of Mississippi. 

• Requiring any national norm-referencing 
component of the exit exam would complicate 
the task of maintaining curricula!' validity for 
the test. 

• There must be close articulation among the 
various assessment programs. They should not 
work at cross purposes, and if they are serving 
the same purposes, perhaps less assessment is 
needed. 

• The use of the various tests in a perl' i mailt c- 
based accreditation model requires carel u 1 
thought regarding how to set the performance 
level and what metric to use in setting the level 
(e.g., average performance or percentage of 
students above some cut score). ♦ 
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'Note that all recommendations follow rather than precede the relevant discussion. 

-One of the non-panel reviewers has pointed out that restricting the assessments to reading, matliemat irs 
and written communication may not fit totally with the Superintendent's exhortation that the assessment 
be designed in such a way that education will have no altei native but to change dramatically. Further, the 
major stakeholders external to the public education arena may have expectations in additional areas. While 
we recognize this, it seems prudent to begin at a f airly modest level. As our recommendatic >n points out, 
additional areas may be added at a later date. 

It is unclear to us how the meshing of English/Language Arts and Reading into one curriculum will 
impact the decision to have separate exit exams in reading and written communication although one of the 
non-panel reviewers of the previous draft posited that this would not likely be a problem. 

'One non-panel reviewer of the previousdraft eloquently articulated the need to focus on the "commit- 
ment to dramatic educational change." Another of the non-panel reviewers suggested much the same 
thing. As that reviewer stated, "I would suggest that we should be looking at criteria f or assessment that 
identify those skills, knowledge, attitudes, and applicat ions of knowledge that students should have. 
Whether those things are currently taught is a distraclor. Assume they are not! Now the challenge hot -nines 
the creation of an assessment whose auricular and instructional validity will he established over time. The 
impact is that such a test and its initial results must be used as baseline for school improvement, and not f < >i 
accountability." We would like to make clear that our stance is not against curricula! change or high stan- 
dards. However, there are legal and moral issues at stake when one implemen ts a high-stakes test over 
untaught material and deprives some students of a diploma. As we point out in point 1 in the conclusions 
section, the change must pr ecede the implementation of the test. An announcement of a high-stakes test 
over new and demanding content to be implemented in the future might legitimately serve as a ratal vst foi 
auricular change. Using the test initially as a catalyst lor school improvement is accept able. What is not 
acceptable is to use it for student a ccount ability prior to establishing that student s have had the o| j|h >i i it- 
nitv to learn the material. 

One non-panel reviewer suggested that what should he stressed is the joint nature of the responsibilitx, 
Both the state and the local districts are responsible. We concur, but out repot I is pi imarilv focused on (In- 
state i esponsibilities. 

' If future documents are more like what we are describing as frameworks, thev should be i efei i eil to as 
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such. If they are more like the current documents, we believe they should continue to be called curriculum 
structures. 

'One non-panel reviewer stressed the importance of this recommendation and pointed out "that any 
effort to save funds by not carefully monitoring contractor work on a daily basis portends real problems. 
...the analogy to daily site visits when building a new home would not be an exaggeration." 

8 Any such items would need to be included as a part of subsequent item tr yout and pilot studies. 

9 One non-panel reviewer sell described as "a strong advocate of performance assessment" commented as 
follows with respect to recommendations 16 and 17. "I concur with your conclusion in Recommendation 16 
about performance assessment and high stakes individual test scores. Our experience has been that perfor- 
mance assessments can produce valid, reliable scores at the grade and content level by school. Use for high 
stakes individual graduation requirement testing is problematic at the present time, particularly because of 
the psychometric demands of number of independent measures and testing time necessary to obtain such 
measures. If Mississippi is, however, going to use the assessments for school improvement purposes, perfor- 
mance assessment offers a richness and opportunity for curricular integration that should be seriously 
considered." We would concur. It is in the context of using performance assessment in high-stakes exit 
examinations that our psychometric cautions should be heeded most carefully. Of course, even for school 
improvement purposes, one desires accurate assessment and — other things being equal — low costs are 
preferable to high costs. 

'"See our related discussion of timing in the Legal Issues section on p. 40. 

"A partial compensatory model may also have implications for other technical considerations such as 
reliability, scaling, and equating. One can not look at the cut score process in isolation. 
'-One non-panel reviewer of the previous draft questioned whether the majority should be teachers. 
"The composition of this committee is discussed later in the report. 

l4 The training of this committee and the running of the item sensitivity reviews could be made a part of 
the test development contract. However, thought should be given to whether this overview could be per- 
ceived as a conflict of interest for the contractor in as much as the contractor wrote the original items and 
may be perceived as having a vested interest in keeping the items. 

'This is not technically a reliability estimate, it is an estimate of the consistency of scoring. However, it 
often is referred to as interrater reliability. 

lfi We do not have full technical details concerning how the scaled scores for the FI ,E were derived. We 
assume the scores are linear transformations from either logits or raw scores setting the scaled score of 233 
at the cut score (70% correct) and the scaled score of 211 at 60% correct. 

''Assessments for out-of-school adults should probably be under the control of the LEAs and be done at 
those same sites. Otherwise, the security problems increase immensely. 

1H It is possible that the child has not had an initial opportunity to learn the required content and skills. 
Thus, remediation may not be precisely the correct term. Also, we should point out that appropriate early 
assessments to identify weaknesses coupled with developmental instructional efforts should make reme- 
diation after the test less necessary than if such early detection and intervention efforts do not occur. 

'"Ideally the additional instruction should be provided in a manner and at times that do not take away 
opportunities to learn in other domains. 

'-"'One non-panel reviewer commented that "a strategy worthy of consideration might be to have a notifi- 
cation which must be signed by both the student and parent that clearly identif ies that both parent and 
student understand that the passage of the test is a graduation requirement." 

'•''Actually, if the first field testing of the items takes place in the fall of 1996, there is no reason not to notify 
them as of that date. 

-See our final section on accreditation regarding use of the MA AP for that purpose. 

- :, An example of what may be either incomplete documentation or inadequate communication is on the 
source and date of some of the documents we were given to review. Ideally, every document should be dated 
and the sourc e of the document should be evident. 

-'A disadvantage of this approach is that the members of the bias committee may not feel as f ree tomakc 
comments about whic h items thev believe are biased if thev can not do so anonymously. 
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-One member of the panel believes the technical advisory committee should recommend the cut score. 
The remainder of the panel believes that it is important that the cut score be recommended by Mississippi 
educators. Either way, we all agree that the commonly used expression "setting" the cut score is a slight 
misnomer. Really, the committee makes a recommendation which is forwarded to other groups. The actual 
setting of the cut score is done by a governmental agency that has the power to make such a determination. 
That agency uses the information from the standard setting committee. 

*One member of the panel strongly recommends using a single contractor (with three or four phases of 
the contract) for test development, field testing and test implementation. For continuity reasons this would 
be preferable if one can find a contractor that indeed is best at all these different components. 

27 One could also consider progress toward a goal as a criterion in the accreditation process. 

'■This recommendation assumes that the MDE will maintain cumulative pass rate information. 
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